strtod() and sprintf() inconsistency under GCC and MSVC_问答_开发者

I'm working on a cross-platform app for Windows and Mac OS X, and I have a problem with two standard C library functions:

strtod() - string-to-double conversion
sprintf() - when used for outputting double-precision floating point numbers)

Their GCC and MSVC versions return different results, in some digits of mantissa. But it plays a cruicial role if the exponent value is large. An example:

MSVC: 9,999999999999999500000000000000e+032
GCC:  9,999999999999999455752309870428e+32
MSVC: 9,999999999999999500000000000000e+033
GCC:  9,999999999999999455752309870428e+33
MSVC: 9,999999999999999700000000000000e+034
GCC:  9,999999999999999686336610791798e+34

The input test numbers have an identical binary representation under MSVC and GCC.

I'm looking for a well-tested cross-platform open-source implementation of those functions, or just for a pair of functions that would correctly and consistently convert double to string and back.

I've already tried the clib GCC implementation, but the code is too long and too dependent on other source files, so I expect the adap开发者_如何学编程tation to be difficult.

What implementations of string-to-double and double-to-string functions would you recommend?

Converting between floating point numbers and strings is hard - very hard. There are numerous papers on the subject, including:

What Every Computer Scientist Should Know About Floating-Point Arithmetic
How to Print Floating-Point Numbers Accurately
How to Read Floating-Point Numbers Accurately
General Decimal Arithmetic

The last of those is a treasure trove of information on floating point decimal arithmetic.

The GNU glibc implementation is likely to be about as good as it gets - but it won't be short or simple.

Addressing examples

A double normally stores 16 (some might argue 17) significant decimal digits. MSVC is processing 17 digits. Anything beyond that is noise. GCC is doing as you ask it, but there aren't enough bits in a double to warrant the extra 14 digits you are requesting. If you had 16-byte 'long double' values (SPARC, PPC, Intel x86_64 for Mac), then you might warrant 32 significant figures. However, the differences you are showing are QoI; I might even argue that MS is doing a better job than GCC/glibc here (and I don't often say that!).

The only algorithm I know for printing the exact value of a floating point number in decimal is as follows:

Convert the mantissa to a decimal integer. You can either do this by pulling apart the bits to read the mantissa directly, or you can write a messy floating point loop that first multiplies the value by a power of two to put it in the range 1<=x<10, then pulls off a digit at a time by casting to int, subtracting, and multiplying by 10.
Apply the exponent by repeatedly multiplying or dividing by 2. This is an operation on the string of decimal digits you generated. Every ~3 multiplications will add an extra digit to the left. Every single dividion will add an extra digit to the right.

It's slow and ugly but it works...

The following function dtoa returns a string that losslessly converts back into the same double.

If you rewrite aisd to test all of your string-to-float implementations, you'll have portable output among them.

  // Return whether a string represents the given double.
  int aisd(double f, char* s) {
     double r;
     sscanf(s, "%lf", &r);
     return r == f;
  }

  // Return the shortest lossless string representation of an IEEE double.
  // Guaranteed to fit in 23 characters (including the final '\0').
  char* dtoa(char* res, double f) {
     int i, j, lenF = 1e9;
     char fmt[8];
     int e = floor(log10(f)) + 1;

     if (f > DBL_MAX) { sprintf(res, "1e999"); return res; }  // converts to Inf
     if (f < -DBL_MAX) { sprintf(res, "-1e999"); return res; }  // converts to -Inf
     if (isnan(f)) { sprintf(res, "NaN"); return res; }  // NaNs don't work under MSVCRT

     // compute the shortest representation without exponent ("123000", "0.15")
     if (!f || e>-4 && e<21) {
        for (i=0; i<=20; i++) {
           sprintf(fmt, "%%.%dlf", i);
           sprintf(res, fmt, f);
           if (aisd(f, res)) { lenF = strlen(res); break; }
        }
     }

     if (!f) return res;

     // compute the shortest representation with exponent ("123e3", "15e-2")
     for (i=0; i<19; i++) {
        sprintf(res, "%.0lfe%d", f * pow(10,-e), e); if (aisd(f, res)) break;
        j = strlen(res); if (j >= lenF) break;
        while (res[j] != 'e') j--;
        res[j-1]--; if (aisd(f, res)) break;   // try mantissa -1
        res[j-1]+=2; if (aisd(f, res)) break;  // try mantissa +1
        e--;
     }
     if (lenF <= strlen(res)) sprintf(res, fmt, f);
     return res;
  }

See Can't get a NaN from the MSVCRT strtod/sscanf/atof functions for the MSVCRT NaN problem. If you don't need to recognize NaNs, you can output infinity ("1e999") when you get one.