开发者

Mixing TBB with SSE2 intrinsics

开发者 https://www.devze.com 2023-02-08 15:48 出处:网络
Is using SSE2 intrinsic in the parallel_for a good idea ? Since the number of SSE2 registers is limited, will it give rise to penalty in terms of performance ?

Is using SSE2 intrinsic in the parallel_for a good idea ?

Since the number of SSE2 registers is limited, will it give rise to penalty in terms of performance ?

Does each CPU di开发者_如何转开发e have its own SSE2 registers ?


Each CPU core has its own SSE registers. Threads and SSE are pretty much un-related. Feel free to use both.


Is using SSE2 intrinsic in the parallel_for a good idea ? That depends. It definitely is not a bad idea. You should profile your code, and use intrinsics where performance matters most.

Since the number of SSE2 registers is limited, will it give rise to penalty in terms of performance ? If you are concerned with register pressure then you don't have to worry about that. The compiler does the register allocation for you when you use intrinsics (unlike writing assembly). Code which is hand-written in intrinsics, usually is more compact than code compiled from a high level language. You should profile your code after each change you make to see if the performance has improved.

Does each CPU die have its own SSE2 registers ? Each logical CPU has its own 8 (in 32-bit mode) or 16 (in 64-bit mode) XMM registers. In modern CPUs, each core is a logical CPU, or even two logical CPUs if you have hyper-threading enabled.

0

精彩评论

暂无评论...
验证码 换一张
取 消