This question is very much related to:
A) How to separate CUDA code into multiple files
B) Link error LNK2005 when trying to compile several CUDA files together
Following advice from here: https://meta.stackexchange.com/questions/42343/same-question-but-not-quite and here https://meta.stackexchange.com/questions/8910/asking-a-similar-but-not开发者_StackOverflow社区-the-same-question
I am asking a very similar question but I want to be absolutely clear about where is the difference between my question and the questions linked above.
I was getting the linker errors from the title when including a header file, which contained the definition of a __device__
function, into multiple source files.
This is different from Link A) where the same errors occur with __kernel__
functions because __device__
according to the CUDA manual implies inline
:
In device code compiled for devices of compute capability 1.x, a
__device__
function is always inlined by default. The__noinline__
function qualifier however can be used as a hint for the compiler not to inline the function if possible (see Section E.1).
Link B) is more related (and one answer correctly points out that it seems not to get inlined no matter what the manual says) but link B) refers to a header shipped by NVIDIA rather than a own header so while the problem is most likely to lie within my header file, it is most unlikely to lie within a NVIDIA header file. In other words it is likely that Link B) and my questions have different answers.
In the meantime I have found out that declaring a function as __device__ inline
solves the problem so the above is only to document the solution for the rest of the world.
The open question is the reason for that behaviour.
Possible explanations I came up with:
- The manual is wrong
nvcc -arch=compute_11
does not qualify as "compiling for devices of compute capability 1.x" or there is a bug in nvcc- this is MS-VS specific and does work on platforms tested by NVIDIA
- I have a severe misconception about how
inline
works. A non cuda related example ca ne found here: Multiply defined linker error using inlined functions My understanding is the one expressed by "caf" there that "the compiler shouldn't generate an external definition of the function, so it shouldn't bother the linker" others over there seemed to disagree.
I'd greatly apprechiate if someone with more insght could clarify what is happening here.
In MS VS, as well as in gcc and possibly other compilers (but not in the one referenced by your "multiply defined linker error" link), inline implies static by default. You can force a function to be extern inline, but, unless you do, the compiler either won't place an external definition of the function into the object file, or will mark it as safe to duplicate somehow.
HOWEVER, nowhere in the documentation does it say that CUDA __device__
functions are effectively declared inline (and therefore static). The documentation says that the function is "always inlined by default". There's a subtle difference.
精彩评论