Using the Accelerate framework from OSX, you get access to 4-way SIMD functionality where you can operate on vector floats, vector ints and vector bools. It gives you 4-way divisions e.g. and also 4-way sin,cos,tan etc.
For a vector float of 4 floats, the framework provides vFloat. For a vector bool of 4 bools, the framework provides vBool32.
What I am trying to accomplish is the 4-way SIMD version of this line of code:
float a = ...;
float b = ...;
bool condition = ...;
float selected = condition ? a : b;
On a Cell processor e.g., you would use the intrinsic 'spu_sel(val1, val2, conditional)'.
I tried writing down the 4-way selection as:
vFloat a = { ... };
vFloat b = { ... };
vBool32 condition = { ... };
vFloat selected = condition ? a : b;
...whic开发者_如何转开发h is not accepted by the LLVM compiler, as the '?' operator does not accept vBool32. Also, there is no operator called 'vsel' or 'vself' or something similar on the webpage mentioned above. Is there floating point selection available at all in this framework? And if so, how to access it?
If you want to work at this level of abstraction then you'll probably have to settle for multiplying by 1.0f or 0.0f to achieve the desired result. This is actually still quite efficient because AltiVec and SSE can both issue at least one SIMD floating point multiply per clock cycle.
If you want to get every last bit of performance though then I think you'll need to drop down to native SIMD programming and use the relevant intrinsics (vec_sel
in the case of AltiVec, _mm_blend_ps
in the case of SSE4, _mm_and_ps
/_mm_andnot_ps
/_mm_or_ps
in the case of older SSE implementations).
精彩评论