Does anyone have handy the snippets of code to convert an IEEE 754 double
to the immediately inferior (resp. superior) float
, without changing or assuming anything about the FPU's current rounding mode?
Note: this constraint probably implies not using the FPU at all. I expect the simplest way to do it in these conditions is to read the bits of the double in a 64-bit long and to work with that.
You can assume the endianness of your choice for simplicity, and that the double in question is available through the d
field of the union below:
union double_bits
{
long i;
double d;
};
I would try to do it myself but I a开发者_StackOverflow社区m certain I would introduce hard-to-notice bugs for denormalized or negative numbers.
I think the following works, but I will state my assumptions first:
- floating-point numbers are stored in IEEE-754 format on your implementation,
- No overflow,
- You have
nextafterf()
available (it's specified in C99).
Also, most likely, this method is not very efficient.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(int argc, char *argv[])
{
/* Change to non-zero for superior, otherwise inferior */
int superior = 0;
/* double value to convert */
double d = 0.1;
float f;
double tmp = d;
if (argc > 1)
d = strtod(argv[1], NULL);
/* First, get an approximation of the double value */
f = d;
/* Now, convert that back to double */
tmp = f;
/* Print the numbers. %a is C99 */
printf("Double: %.20f (%a)\n", d, d);
printf("Float: %.20f (%a)\n", f, f);
printf("tmp: %.20f (%a)\n", tmp, tmp);
if (superior) {
/* If we wanted superior, and got a smaller value,
get the next value */
if (tmp < d)
f = nextafterf(f, INFINITY);
} else {
if (tmp > d)
f = nextafterf(f, -INFINITY);
}
printf("converted: %.20f (%a)\n", f, f);
return 0;
}
On my machine, it prints:
Double: 0.10000000000000000555 (0x1.999999999999ap-4)
Float: 0.10000000149011611938 (0x1.99999ap-4)
tmp: 0.10000000149011611938 (0x1.99999ap-4)
converted: 0.09999999403953552246 (0x1.999998p-4)
The idea is that I am converting the double
value to a float
value—this could be less than or greater than the double value depending upon the rounding mode. When converted back to double
, we can check if it is smaller or greater than the original value. Then, if the value of the float
is not in the right direction, we look at the next float
number from the converted number in the original number's direction.
To do this job more accurately than just re-combine mantissa and exponent bit's check this out:
http://www.mathworks.com/matlabcentral/fileexchange/23173
regards
I posted code to do this here: https://stackoverflow.com/q/19644895/364818 and copied it below for your convenience.
// d is IEEE double, but double is not natively supported.
static float ConvertDoubleToFloat(void* d)
{
unsigned long long x;
float f; // assumed to be IEEE float
unsigned long long sign ;
unsigned long long exponent;
unsigned long long mantissa;
memcpy(&x,d,8);
// IEEE binary64 format (unsupported)
sign = (x >> 63) & 1; // 1
exponent = ((x >> 52) & 0x7FF); // 11
mantissa = (x >> 0) & 0x000FFFFFFFFFFFFFULL; // 52
exponent -= 1023;
// IEEE binary32 format (supported)
exponent += 127; // rebase
exponent &= 0xFF;
mantissa >>= (52-23); // left justify
x = mantissa | (exponent << 23) | (sign << 31);
memcpy(&f,&x,4);
return f;
}
精彩评论