In OpenCL, I have a kernel that needs to operate on complex and real data. I could put a conditional statement in that calls the right line of code to handle this, or I could have two kernels that I call a开发者_如何学JAVAnd push the conditional statement out to my calling code.
This obviously is bad for maintainability, but is it significant for performance?
If it's just one conditional statement, in my experience the performance difference is absolutely negligible, at least on NVidia hardware.
Basically, as long as all (or most) work-items follow the same code path, you're fine. As the code path taken depends on a kernel argument in your case, all work-items follow the same path.
Depends slightly on where the conditional is. Code for readability first, then performance after you have measured it AND found that it's a problem
eg. kernel_for_RGB_image and kernel_for_ABGR_image seems like a reasonable use, different kernels to effectively unroll some deep inner loop might be a bigger maintenance headache.
I think that the best way is to actually try and benchmark two variants. In some cases having multiple conditional blocks compiled, even if only one of them is executed, can lead to worse performance. The reason is GPRs (general purpose registers): the compiler allocates as many registers, as are needed for worst case.
I can suggest such a solution: have a single kernel function, but with compile-time conditional:
__kernel void work()
{
#if VAR
// one code
#else
// another code
#endif
}
Then you need to recompile the kernel with true
/false
set to VAR
when changing condition. Obviously, for the compiler it doesn't differ from two kernels, but for maintaining may be better, if a part of code is the same for those kernels.
精彩评论