I have an ARGB pixel stored in an 128 bit NEON register as 32bit per channel. I ne开发者_如何学Ced to store this into memory as an 8bit channel ARGB (narrowing and saturating).
I got my result after a vmla.32 q1, q2, d0; wondering if I could achieve narrowing or saturation through the mul instruction directly saving some cycles.
What's the best way to go about it?
There's no such encoding as vmla.32 q1, q2, d0
; let's assume you meant q0
.
The simple, naive answer is:
vqmovn.s32 d0, q1 // saturate and narrow 32 -> 16
vqmovn.s16 d0, q0 // saturate and narrow 16 -> 8
this does signed saturation; if you have unsigned values, use the .u32
and .u16
types, and if you have signed values but want to saturate to unsigned, you use the vqmovun
instruction.
To your question of whether or not you can do some sort of narrowing multiply, that depends heavily on the exact operation (and the values involved); given that you're using a vmla
, the answer is "probably not", however.
Can you use the saturating operations in NEON and avoid widening to start with, or do you need all of that headroom?
精彩评论