Suppose I'm in an environment whose only float type is (say) 32 bits wide, and I want to do some calculations with 64-bit floats.
Is there any reasonable/efficient/feasible way to use the float32 data type to implement a wider floating-开发者_Go百科point data type like float64? (Or even float63, or float48, or anything notably wider than the 32 I have?)
Or am I best off just implementing longer floating-point data types using integers?
I'm guessing the specific meanings of the bit locations would make this basically impossible, but I'm no expert at numerics, and I don't know how to search for this, so maybe there's a clever approach out there.
There are tricks you can employ to get more precision than your machine wants to give you. One of the best known is the Kahan summation algorithm for getting extra precision when computing a sum of a set of floating point numbers:
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
Designing such a procedure requires rather detailed knowledge of the inner workings of floating point computation, so proceed with caution.
I think your guess is correct. One could try to represent a 64 bit float as the product of two 32 bit floats, but I don't think you could get both the exponent and the mantissa to work at the same time. It might be possible to get something like a float48 as a net effect, but it seems rather dodgy overall.
But there should be libraries for doing it with ints. It's not that long ago that commonly used CPUs didn't have hardware float support.
精彩评论