cuda SM register limit_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-18 19:47 出处：网络

相关专题：

I know number of block running on one SM is limited by block number, threads, shared memory, and register. Is there any strategy to avo开发者_开发知识库iding having too many registers? I mean I just don't want to have too many of them, eventually it limits the number of block I run on one SM.

Compiling with nvcc -Xptxas -v will print out the diagnostic information Edric mentioned. Additionally, you can force the compiler to conserve registers using the __launch_bounds__ qualifier. For example

__global__ void
__launch_bounds__(maxThreadsPerBlock, minBlocksPerMultiprocessor)
MyKernel(...)
{ 
   ...
}

guarantees that at least minBlocksPerMultiprocessor blocks of size maxThreadsPerBlock will fit on a single SM. See Section B.16 of the CUDA Programming Guide for a complete explanation of __launch_bounds__.

One of the main drivers for the number of registers is amount of local data you declare in your kernel. However, the PTX assembler can do quite a good job of re-using registers, so it's not always easy to work out how many will be used from the PTX code - you need to run ptxas to get the real answer.