There is this simple function which I have used with C++ in the past to simulate simple forms of tessellation. The function takes a number and a divisor. The divisor must be (a power of two - 1) and n should be between 0 and divisor. It returns a modulus result of n % (d+1) using bitwise &.
Fairly sure the function goes like:
unsigned int BitwiseMod(unsigned int n, unsigned int d){ return n & d; }
I am wanting to use this effectively in OpenCL and am wondering if it will work as I imagine it too. In my mind, modulus is a very expensive operation on the GPU but I am familiar using it to form magnitude spaces and other techniques to travel through data.
More often, I would be more likely to simply write this assuming functions have some overhead.
x[i] = 8*(i&d)+offset[i]; //OR in other contexts,...
num = i&d+offset[i];
x[num] = data;
The question is: Will this be useful or get in the way, if useful can you give me some examples where I might try to apply开发者_如何学编程 it.
On NVidia's architectures, GT200 and up, Modulo isn't particularly slow, not slower than a normal integer divide. See this paper for details.
However, using a bitwise AND is still quite a lot faster. As function calls are expensive on GPUs, OpenCL compilers aggressively use inlining to improve performance by default. You should be fine with a function call, as it will be inlined.
精彩评论