开发者

Call multiple times get_global_id() vs save the result in the local variable?

开发者 https://www.devze.com 2023-01-07 02:26 出处:网络
It is probably a silly quest开发者_C百科ion, but: How expensive is it to call some get_* function in OpenCL-kernels? Is it better to save the result for future usage in some local varialbe or to call

It is probably a silly quest开发者_C百科ion, but: How expensive is it to call some get_* function in OpenCL-kernels? Is it better to save the result for future usage in some local varialbe or to call the desired function whenever it needed?

Or it is platform dependent?

PS I think, cuda solves it better with various threadIdx variables.


I think this should be free for all GPU architectures. It should be replaced by a corresponding hardware register or a constant in a cache bank.

Compiler could also do constant propagation on it. You can check yourself using AMD Stream Analyser:

OpenCL:

__kernel 
void testKernel(__global uint * uintArray)
{
    uint threadId = get_global_id(0);

    uintArray[threadId] = 0xbaadf00d;
}

Radeon HD 5870 (Cypress) assembly:

0 ALU: ADDR(32) CNT(10) KCACHE0(CB0:0-15) KCACHE1(CB1:0-15) 
      0  x: MOV         R1.x,  (0xBAADF00D, -0.001327039325f).x      
         t: MULLO_INT   ____,  R1.x,  KC0[1].x      
      1  x: ADD_INT     ____,  R0.x,  PS0      
      2  w: ADD_INT     ____,  PV1.x,  KC0[6].x      
      3  z: LSHL        ____,  PV2.w,  (0x00000002, 2.802596929e-45f).x      
      4  y: ADD_INT     ____,  KC1[0].x,  PV3.z      
      5  x: LSHR        R0.x,  PV4.y,  (0x00000002, 2.802596929e-45f).x      
01 MEM_RAT_CACHELESS_STORE_RAW: RAT(1)[R0].x___, R1,  VPM 

Here get_global_id(0) maps to constant cache bank value KC0[1].x. So, for answering your question I would use the most readable form.

0

精彩评论

暂无评论...
验证码 换一张
取 消