SSE best way to set register to 0.0's and 1.0's?_问答_开发者

SSE best way to set register to 0.0's and 1.0's?

开发者 https://www.devze.com 2023-02-08 20:21 出处：网络

I am doing some sse vector3 math. Generally, I set the 4th digit of my vector to 1.0f, as this makes most of my math work, but sometimes I need to set it to 0.0f.

I am doing some sse vector3 math.

Generally, I set the 4th digit of my vector to 1.0f, as this makes most of my math work, but sometimes I need to set it to 0.0f.

So I want to change something like: (32.4f, 21.2f, -4.0f, 1.0f) to (32.4f, 21.2f, -4.0f, 0.0f)

I was wondering what the best method to doing so would be:

Convert to 4 floats, set 4th float, send back to SSE
xor a register with itself, then do 2 s开发者_开发技巧hufps
Do all the SSE math with 1.0f and then set the variables to what they should be when finished.
Other?

Note: The vector is already in a SSE register when I need to change it.

AND with a constant mask.

In assembly ...

myMask:
.long 0xffffffff, 0xffffffff, 0xffffffff, 0x00000000

...
andps  myMask, %xmm#

where # = {0, 1, 2, ....}

Hope this helps.

Assuming your original vector is in xmm0:

; xmm0 = [x y z w]
xorps %xmm1, %xmm1         ; [0 0 0 0]
pcmpeqs %xmm2, %xmm2       ; [1 1 1 1] 
movss %xmm1, %xmm2         ; [0 1 1 1]
pshufd $0x20, %xmm1, %xmm2 ; [1 1 1 0]
andps %xmm2, %xmm0         ; [x y z 0]

should be fast since it does not access memory.

If you want to do it without memory access, you could realize that the value 1 has a zero word in it, and the value zero is all zeroes. So, you can just copy the zero word to the other. If you have the 1 in the highest dword, pshufhw xmm0, xmm0, 0xa4 should do the trick:

(gdb) ni
4       pshufhw $0xa4, %xmm0, %xmm0
(gdb) p $xmm0.v4_float
$4 = {32.4000015, 21.2000008, -4, 1}
(gdb) ni
5       ret
(gdb) p $xmm0.v4_float
$5 = {32.4000015, 21.2000008, -4, 0}

The similar trick for the other locations is left as an excercise to the reader :)

pinsrw?

Why not multiply your vector element wise with [1 1 1 0]? I'm pretty sure there is an SSE instruction for element wise multiplication.

Then to go back to a vector with a 1 in the 4th dimension, just add [0 0 0 1]. Again there is an SSE instruction for that, too.