I am doing some floating point calculations and the results are not as accurate as I want them to be.
This is the algorithm:
... center = (max_x + min_x) / 2 distance = old_x - center new_x = center + (distance * factor) return new_x
min_x, max_x, and o开发者_运维问答ld_x are all floats. I believe that the greatest error is introduced when I'm taking the average of the max and the min, and then the error is multiplied by the factor (which can be a float).
How can I minimize the error due to FP computation so that new_x is as precise as it can be?
If old_x and center are close then you're losing precision.
It's called Loss of significance
You could change the calculation so the subtraction happenS in the end:
center = (max_x + min_x) / 2
new_x = (center + (old_x * factor)) - (center * factor)
Depending on your language, there is probably a fixed/arbitrary precision numeric type you can use such as decimal in python or BigDecimal in Java.
This eliminates at least one source of error from your original algorithm:
# Adding min and max can produce a value of larger magnitude, losing some low-order bits
center = min_x + (max_x - min_x)/2
distance = old_x - center
new_x = center + (distance * factor)
return new_x
If you have more knowledge of the relationship between old_x
, min_x
andmax_x
, you can probably do better than this.
As Yochai says, your problem is probably caused by the subtraction old_x - center
. If old_x
and center
are close to each other then you lose precision.
The simple solution would be do to the computation using double
instead of float
, but I guess that's not possible. In that case, you need to get rid of the subtraction. One possibility is
distance_max = max_x - center
distance_min = min_x - center
distance = (distance_max + distance_min) / 2
new_x = center + factor * distance
This helps if max_x
, min_x
and center
are quite far apart while the average of max_x
and min_x
is close to center
. If that does not help, perhaps you can adapt the computation of max_x
so that you actually compute max_x - center
but that needs changes in the part you did not show us.
All the previous implementations do not use rounding and thus have a large error: Here's how to do this in fixed point math: I'm using X.1u prevision (1 LSB is used for fraction part).
//center = (max_x + min_x) / 2
center = max_x + min_x // zero error here
// distance = old_x - center
distance = (old_x << 1) - center // zero error here
//new_x = center + (distance * factor)
new_x = (**1** + center + (distance * factor)) >> 1
return new_x
If factor is a fixed point (integer) too with N bits describing the fraction then new_x can be calculated as:
new_x = ( (1 << N) + (center << N) + (distance * factor) ) >> (N + 1)
- (center << N) has N+1 fraction bits
- distance * factor has N+1 fraction bits
- (1 << N) is a 'half' as 1 << (N+1) is 'one' in the above fixed point precision.
After understanding each part, the above line can be compacted:
new_x = ( ((1 + center) << N) + (distance * factor) ) >> (N + 1)
The used integer type should be large enough, off course. If the valid range is unknown, one should check the input to this function and something else. In most cases this isn't needed.
This is as good as it get in fixed point math. This is how HW circuits perform integer math operations.
精彩评论