Why do two floating point multiplies give a different answer than one?_问答_开发者

Why do two floating point multiplies give a different answer than one?

开发者 https://www.devze.com 2023-01-14 17:13 出处：网络

I recently ran into an issue where I wasn\'t getting the numerical result I expected. I tracked it down to the problem that is illustrated by the following example:

I recently ran into an issue where I wasn't getting the numerical result I expected. I tracked it down to the problem that is illustrated by the following example:

#include <stdio.h>

int main()
{
  double sample = .5;
  int a = (int)(sample * (1 << 31));
  int b = (int)(sample * (1 开发者_Go百科<< 23) * (1 << 8));
  printf("a = %#08x, b = %#08x\n", a, b);
}
// Output is: a = 0xc0000000, b = 0x40000000

Why is the result of multiplying by (1 << 31) different than the result of multiplying by (1 << 23) * (1 << 8)? I expected the two to give the same answer but they don't.

I should note that all my floating point values are in the range [-1, 1).

You are apparently expecting identical results since you assume that to multiply by (1 << 31) is the same as to multiply by (1 << 23) and then by (1 << 8). In general case they are not the same. You are performing the (1 << 31) calculation in a signed int domain. If your platform uses 32-bit ints, the (1 << 31) expression overflows, while both (1 << 23) and (1 << 8) are not overflowing. This immediately means that the result of the first multiplication is unpredictable.

In other words, it doesn't make any sense to do (1 << 31) on a platform that has only 31 bits in the value representation of int type. You need at least 32 value-forming bits to meaningfully calculate (1 << 31).

If you want your (1 << 31) to make sense, calculate in it the unsigned domain: (1u << 31), (1u << 23) and (1u << 8). That should give you consistent results. Alternatively, you can use a larger signed integer type.