I am dealing with a set of (largish 2k x 2k) images
I need to do per-pixel operations down a stack of a few sequential images.Are t开发者_如何学运维here any opinions on using a single 2D large texture + calculating offsets vs using 3D arrays?
It seems that 3D arrays are a bit 'out of the mainstream' in the CUDA api, the allocation transfer functions are very different from the same 2D functions.
There doesn't seem to be any good documentation on the higher level "how and why" of CUDA rather than the specific calls
There is the best practices guide but it doesn't address this
I would recommend you to read the book "Cuda by Example". It goes through all these things that aren't documented as well and it'll explain the "how and why".
I think what you should use if you're rendering the result of the CUDA kernel is to use OpenGL interop. This way, your code processes the image on the GPU and leaves the processed data there, making it much faster to render. There's a good example of doing this in the book.
If each CUDA thread needs to read only one pixel from the first frame and one pixel from the next frame, you don't need to use textures. Textures only benefit you if each thread is reading in a bunch of consecutive pixels. So you're best off using a 3D array.
Here is an example of using CUDA and 3D cuda arrays: https://github.com/nvpro-samples/gl_cuda_interop_pingpong_st
精彩评论