C - How to access elements of vector using GCC SSE vector extension_问答_开发者

C - How to access elements of vector using GCC SSE vector extension

开发者 https://www.devze.com 2022-12-11 21:10 出处：网络

Usually I work with 3D vectors using following types: typedef vec3_t float[3]; initializing vectors using smth. like:

相关专题：gcc sse

Usually I work with 3D vectors using following types:

typedef vec3_t float[3];

initializing vectors using smth. like:

开发者_如何学JAVA

vec3_t x_basis = {1.0, 0.0, 0.0};
vec3_t y_basis = {0.0, 1.0, 0.0};
vec3_t z_basis = {0.0, 0.0, 1.0};

and accessing them using smth. like:

x_basis[X] * y_basis[X] + ...

Now I need a vector arithmetics using SSE instructions. I have following code:

typedef float v4sf __attribute__ ((mode(V4SF)))
int main(void)
{
    v4sf   a,b,c;
    a = (v4sf){0.1f,0.2f,0.3f,0.4f};
    b = (v4sf){0.1f,0.2f,0.3f,0.4f};
    c = (v4sf){0.1f,0.2f,0.3f,0.4f};
    a = b + c;
    printf("a=%f \n", a);
    return 0;
}

GCC supports such way. But... First, it gives me 0.00000 as result. Second, I cannot access the elements of such vectors. My question is: how can I access elements of such vectors? I need smth. like a[0] to access X element, a[1] to access Y element, etc.

PS: I compile this code using:

gcc -msse testgcc.c -o testgcc

The safe and recommended way to access the elements is with a union, instead of pointer type punning, which fools the aliasing detection mechanisms of the compiler and may lead to unstable code.

union Vec4 {
    v4sf v;
    float e[4];
};

Vec4 vec;
vec.v = (v4sf){0.1f,0.2f,0.3f,0.4f};
printf("%f %f %f %f\n", vec.e[0], vec.e[1], vec.e[2], vec.e[3]);

Note that gcc 4.6 now supports subscripted vectors:

In C vectors can be subscripted as if the vector were an array with the same number of elements and base type. Out of bound accesses invoke undefined behavior at runtime. Warnings for out of bound accesses for vector subscription can be enabled with -Warray-bounds.

You are forgetting that you need to reinterpret a as array of floats. Following code works properly:

int main(){
    v4sf a,b,c;
    a = (v4sf){0.1f,0.2f,0.3f,0.4f};
    b = (v4sf){0.1f,0.2f,0.3f,0.4f};
    c = (v4sf){0.1f,0.2f,0.3f,0.4f};
    a = b + c;
    float* pA = (float*) &a;
    printf("a=[%f %f %f %f]\n",pA[0], pA[1], pA[2], pA[3]);
    return 0;
}

P.S.: thanks for this question, I didn't know that gcc has such SSE support.

UPDATE: This solution fails once arrays got unaligned. Solution provided by @drhirsh is free from this problem.