I have a fragment shader which is doing a for loop with the number of passes passed in as a uniform int variabl开发者_运维百科e.
uniform int numPasses;
void main(void) {
for (int i=0; i<numPasses; i=i+1) {
//do something
}
}
I am seeing that the performance drops sharply as the number of loops increases. So is this the proper way to perform looping computations in a fragment shader, or should I just perform the loop on the CPU using ping-ponging between 2 framebuffer attachments?
I am in the process of trying out the ping-ponging, but I just wanted to know the views of people who may have run into this thing earlier.
Given you don't need data from adjacent threads, it looks like it'd be faster not to bother with multiple rendering passes and avoid the draw calls, synchronization and rasterization.
That said, if your loop is big, the total computations vary a lot in time or you don't have enough fragment shaders running to keep the GPU busy, it might hurt performance.
GPUs are complex and it's easy to assume the wrong thing. As you're already doing, testing both is probably best. It'd be interesting to see the difference when you vary the number of passes and fragments.
精彩评论