I am developing a small application using CUDA.
i have a huge 2d array (won't fit on shared memory) in which threads in开发者_高级运维 all blocks will read from constantly at random places. this 2d array is a read-only array. where should i allocate this 2d array? global memory?constant memroy? texture memory?Depending on the size of your device's texture memory, you should implement it in this area. Indeed, texture memory is based upon sequential locality cache mechanism. It means that memory accesses are optimized when threads of consecutive identifiers try to reach data elements within relatively close storage locations.
Moreover, this locality is here implemented for 2D accesses. So when each thread reaches a data element of an array stored in texture memory, you're in the case of consecutive 2D accesses. Consequently, you take a full advantage of the memory architecture.
Unfortunately, this memory is not that big and with huge arrays you might be able to make your data fit in it. In this case, you can't avoid to use the global memory.
I agree the jHackTheRipper, a simple solution would be to use texture memory and then profile using the Compute Visual Profiler. Heres a good set of slides from NVIDIA about the different memory types for image convolution; it shows that good shared memory usage and global reads was not too much faster than using texture memory. In your case you should get some coalesced reads from the texmemory that you wouldn't usually get with accessing random values in global memory.
If it's small enough to fit it constant or texture, I would just try all three.
One interesting option that you don't have listed here is mapped memory on the host. You can allocate memory on the host that will be accessible from the device, without explicitly transferring it to device memory. Depending on the amount of the array you need to access, it could be faster than copying to global memory and reading from there.
精彩评论