SSE: _mm_mul_ps won't multiply 10001 with 10001 correctly but works fine for 10000 with 10000_问答_开发者

SSE: _mm_mul_ps won't multiply 10001 with 10001 correctly but works fine for 10000 with 10000

开发者 https://www.devze.com 2023-03-28 07:08 出处：网络

I have a very simple program to multiply four numbers. It works fine when each of them is 10000 but does not if I change them to 10001. The result

相关专题：simd sse x86

I have a very simple program to multiply four numbers. It works fine when each of them is 10000 but does not if I change them to 10001. The result is off by one.

I compiled the program with gcc -msse2 main_sse.c -o sse on both AMD Opteron and Intel Xeon and get the same result on both machines.

I would appreciate any help. Couldn't find anything online on this开发者_如何学C topic.

#include <stdlib.h>
#include <stdio.h>
#include <xmmintrin.h>

int main(){

    float x[4], y[4], temp[4]; int i;  __m128 X, Y, result;

    for(i=0; i < 4; i++) { x[i] = 10000; y[i] = 10000; }

    X = _mm_load_ps(&x[0]);  Y = _mm_load_ps(&y[0]);  
    result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);

    for(i=0; i < 4; i++) { x[i] = 10001; y[i] = 10001; }

    X = _mm_load_ps(&x[0]);  Y = _mm_load_ps(&y[0]);  
    result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
}

You are running into the limits of precision of IEEE 32 bit floating point numbers.

There are only 23 bits of fractional mantissa plus an implied '1' at the beginning. So the largest integer that can be exactly represented is 2²⁴ = 16777216

You would need 27 bits to exactly represent the product of 10001*10001 = 100020001.

Once you go above 2²⁴, you only get the nearest even integer.

Once you go above 2²⁵, you only get the nearest multiple of 4.

And so on.