How do I convert a number from double-precision to single-precision and back?
For example, I have this in Java/C/C#:
double x = 0.00001;
float f = (float)x; // should be 0.000009999999747378752 not 0.0000100000000000000008180305391403
int n = Math.floor(1001 * f / x); // should be 1000 not 1001.
(don't ask. this is a simplified version of what some hairy C code looks like and I need to port it 1:1)
Another example:
double y = Math.floor((double)(float)5.999999876); // should be 6.0
What I already tried:
var f:float = x; // syntax error
var f = x as float; // syntax error
var f = parseFloat(x); // returns full double-precision (0.0000100000000000000008180305391403)
var f = toFixed(x, ...); // Can't use this as float is not defined by number of decimal places but number of binary places.
I'm using Rhino (as part of Java 7), and it should also be patible with Nashorn (as part of Java 8). I have access to the entire public Java API, if this helps.
[Edit] I did some more experiments and it seems like the problem is not in the float conversion but in the float operations. I really need the FPU in my processor to perform a single-precision fmul
here. 1001 * f
doesn't work if f
contains the float precision version of 0.00001
in a double-precision number. The only way I get an exact result is if I perform a 32-bit multiplication on 1001f * 0.00001f
and then obtain the result.
How do I convert a number from double-precision to single-precision and back?
For example, I have this in Java/C/C#:
double x = 0.00001;
float f = (float)x; // should be 0.000009999999747378752 not 0.0000100000000000000008180305391403
int n = Math.floor(1001 * f / x); // should be 1000 not 1001.
(don't ask. this is a simplified version of what some hairy C code looks like and I need to port it 1:1)
Another example:
double y = Math.floor((double)(float)5.999999876); // should be 6.0
What I already tried:
var f:float = x; // syntax error
var f = x as float; // syntax error
var f = parseFloat(x); // returns full double-precision (0.0000100000000000000008180305391403)
var f = toFixed(x, ...); // Can't use this as float is not defined by number of decimal places but number of binary places.
I'm using Rhino (as part of Java 7), and it should also be patible with Nashorn (as part of Java 8). I have access to the entire public Java API, if this helps.
[Edit] I did some more experiments and it seems like the problem is not in the float conversion but in the float operations. I really need the FPU in my processor to perform a single-precision fmul
here. 1001 * f
doesn't work if f
contains the float precision version of 0.00001
in a double-precision number. The only way I get an exact result is if I perform a 32-bit multiplication on 1001f * 0.00001f
and then obtain the result.
-
1
Javascript only has one
Number
type. – Sverri M. Olsen Commented Oct 27, 2015 at 13:10 -
I know it has one
Number
type, but you still have functions likefloor()
that effectively makes it an int (but not really). – Mark Jeronimus Commented Oct 27, 2015 at 13:11 - You really need to clarify if this is about Java or Javascript... you seem to be using the two terms interchangeably, but they are not the same thing. – Sverri M. Olsen Commented Oct 27, 2015 at 13:34
-
I think Math.floor() is a misleading example because you'd expect it to convert precision (
4.2 -> 4
) when in fact it doesn't (4.20000 -> 4.00001
). – Sphygmomanometer Commented Oct 27, 2015 at 13:48 - 1 I know what you mean and before it gets any more confusing: What you are looking for is not part of the language. But you can achieve that with libraries like the one I mentioned in my answer. It allows you to do precision conversions. – Sphygmomanometer Commented Oct 27, 2015 at 14:06
3 Answers
Reset to default 5This will convert a number to float in javascript:
new Float32Array([15.603179021718429])[0]
The answer is:
15.603178977966309
Oops, even easier, use Math.fround()
. New in ES6.
JavaScript itself doesn't provide any such feature. I remend you use big.js if you absolutely need precision conversions.
All numerical values in JavaScript are Number
s
They are all 64-bit floating point numbers.
Read more here