开发者

nvcc -Xptxas –v compiler flag has no effect

开发者 https://www.devze.com 2023-01-16 02:34 出处:网络
I have a CUDA project. It consists of several .cpp files that contain my application logic and one .cu file that contains multiple kernels plus a __host__ function that invokes them.

I have a CUDA project. It consists of several .cpp files that contain my application logic and one .cu file that contains multiple kernels plus a __host__ function that invokes them.

Now I would like to determine the number of registers used by my kernel(s). My normal compiler call looks like this:

nvcc -arch compute_20 -link src/kernel.cu obj/..obj obj/..obj .. -o bin/..exe -l glew32 ...

Adding the "-Xptxas –v" compiler flag to this call unfortunately has no effect. The compiler still produces the same textual output as before. The compiled .exe also works the same way as before with on开发者_如何转开发e exception: My framerate jumps to 1800fps, up from 80fps.


I had the same problem, here is my solution:

  1. Compile *cu files into device only *ptx file, this will discard host code

    nvcc -ptx *.cu

  2. Compile *ptx file:

    ptxas -v *.ptx

The second step will show you number of used registers by kernel and amount of used shared memory.


Convert the compute_20 to sm_20 in your compiler call. That should fix it.


When using "-Xptxas -v", "-arch" together, we can not get verbose information(register num, etc.). If we want to see the verbose without losing the chance of assigning GPU architecture(-arch, -code) ahead, we can do the following steps: nvcc -arch compute_XX *.cu -keep then ptxas -v *.ptx. But we will obtain many processing files. Certainly, kogut's answer is to the point.


when you compile

nvcc --ptxas-options=-v


You may want to ctrl your compiler verbose option defaults.

For example is VStudio goto : Tools->Options->ProjectsAndSolutions->BuildAndRun then set the verbosity output to Normal.


Not exactly what you were looking for, but you can use the CUDA visual profiler shipped with the nvidia gpu computing sdk. Besides many other useful informations, it shows the number of registers used by each kernel in you application.

0

精彩评论

暂无评论...
验证码 换一张
取 消