popcnt in OpenCL?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-03 18:44 出处：网络

Newer NVIDIA GPUs support a __popc(x) instruction that counts the number of bits set in a 32 bit register.

相关专题：gpu opencl

Newer NVIDIA GPUs support a __popc(x) instruction that counts the number of bits set in a 32 bit register.

I am 99% OpenCL does not support inline assembler unless it is a vendor kernel extension.

1) Does AMD hardware support this yet? (I am not aware of it).

2) For OS X and Linux, how do you intercept the NVIDIA intermediate language that it is compiled to so you could insert this?

I figured out how to dump the PTX "binary" in PyOpenCL, now I just need to figure out how to re-insert it with modifications.

#create the program
self.program = cl.Program(self.ctx开发者_开发知识库, fstr).build()
print self.program.BINARIES[0]

NVIDIA's nvcc supports inline PTX assembly inside OpenCL code using the 'asm' keyword. The notation is similar to GCC inline assembly. I currently use this:

inline uint popcnt(const uint i) {
  uint n;
  asm("popc.b32 %0, %1;" : "=r"(n) : "r" (i));
  return n;
}

Tested and working on Ubuntu Linux.

If you want more information check NVIDIA's oclInlinePTX code sample and the PTX ISA documentation.

If you are using an AMD or Intel card it is irrelevant as you can just use the built-in popcount instruction in OpenCL 1.2.

To the best of my knowledge, there is no inline assembly in any current OpenCL implementation, nor it there any way to intercept PTX (or CAL) during the JIT compilation cycle on OS X or Linux.

popc is a hardware instruction in NVIDIA compute 2.x hardware, but in compute 1.x hardware it is emulated. You can find the code for it in device_functions.h in the CUDA toolkit. You could always implement it as function in OpenCL, at the expense of some speed.

popcnt in OpenCL?

精彩评论

关注公众号

热门标签

图文推荐

popcnt in OpenCL?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：