开发者

Setting up Visual Studio Intellisense for CUDA kernel calls

开发者 https://www.devze.com 2023-03-07 05:43 出处:网络
I\'ve just started CUDA programming and it\'s going quite nicely, my GPUs are recognized and everything. I\'ve partially set up Intellisense in Visual Studio using this extremely 开发者_如何学Gohelpfu

I've just started CUDA programming and it's going quite nicely, my GPUs are recognized and everything. I've partially set up Intellisense in Visual Studio using this extremely 开发者_如何学Gohelpful guide here: http://www.ademiller.com/blogs/tech/2010/10/visual-studio-2010-adding-intellisense-support-for-cuda-c/

and here: http://www.ademiller.com/blogs/tech/2011/05/visual-studio-2010-and-cuda-easier-with-rc2/

However, Intellisense still doesn't pick up on kernel calls like this:

// KernelCall.cu
#include <iostream>
#include "cuda.h"
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

__global__ void kernel(void){}

int main()
{
    kernel<<<1,1>>>();

    system("pause");
    return 0;
}

The line kernel<<<1,1>>>() is underlined in red, specifically the one arrow to the left of the first one with the error reading "Error: expected and expression". However, if I hover over the function, its return type and parameters are displayed properly. It still compiles just fine, I'm just wondering how to get rid of this little annoyance.


Wow, lots of dust on this thread. I came up with a macro fix (well, more like workaround...) for this that I thought I would share:

// nvcc does not seem to like variadic macros, so we have to define
// one for each kernel parameter list:
#ifdef __CUDACC__
#define KERNEL_ARGS2(grid, block) <<< grid, block >>>
#define KERNEL_ARGS3(grid, block, sh_mem) <<< grid, block, sh_mem >>>
#define KERNEL_ARGS4(grid, block, sh_mem, stream) <<< grid, block, sh_mem, stream >>>
#else
#define KERNEL_ARGS2(grid, block)
#define KERNEL_ARGS3(grid, block, sh_mem)
#define KERNEL_ARGS4(grid, block, sh_mem, stream)
#endif

// Now launch your kernel using the appropriate macro:
kernel KERNEL_ARGS2(dim3(nBlockCount), dim3(nThreadCount)) (param1); 

I prefer this method because for some reason I always lose the '<<<' in my code, but the macro gets some help via syntax coloring :).


Visual Studio provides IntelliSense for C++, the trick from the rocket scientist's blog is basically relying on the similarity CUDA-C has to C++, nothing more.

In the C++ language, the proper parsing of angle brackets is troublesome. You've got < as less than and for templates, and << as shift, remember not long ago when we had to put a space in between nested template declarations.

So it turns out that the guy at NVIDIA who came up with this syntax was not a language expert, and happened to choose the worst possible delimiter, then tripled it, well, you're going to have trouble. It's amazing that Intellisense works at all when it sees this.

The only way I know to get full IntelliSense in CUDA is to switch from the Runtime API to the Driver API. The C++ is just C++, and the CUDA is still (sort of) C++, there is no <<<>>> badness for the language parsing to have to work around.


From VS 2015 and CUDA 7 onwards you can add these two includes before any others, provided your files have the .cu extension:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

No need for MACROS or anything. Afterwards everything will work perfectly.


I LOVED Randy's solution. I'll match and raise using C preprocessor variadic macros:

#ifdef __INTELLISENSE__
#define CUDA_KERNEL(...)
#else
#define CUDA_KERNEL(...) <<< __VA_ARGS__ >>>
#endif

Usage examples:

my_kernel1 CUDA_KERNEL(NUM_BLOCKS, BLOCK_WIDTH)();
my_kernel2 CUDA_KERNEL(NUM_BLOCKS, BLOCK_WIDTH, SHMEM, STREAM)(param1, param2);


I've been learning CUDA and have encountered that exact issue. As others have said, it's just Intellisense problem and can be ignored, but I've found a clean solution which actually removes it.

It seems that <<< >>> is interpreted as a correct code if it's inside a template function.

I've discovered it accidentally when I wanted to create wrappers for kernels to be able to call them from a regular cpp code. It's both a nice abstraction and removes the syntax error.

kernel header file (eg. kernel.cuh)

const size_t THREADS_IN_BLOCK = 1024;

typedef double numeric_t;

// sample kernel function headers
__global__ void sumKernel(numeric_t* out, numeric_t* f, numeric_t* blockSum, size_t N);
__global__ void expKernel(numeric_t* out, numeric_t* in, size_t N);
// ..

// strong-typed wrapper for a kernel with 4 arguments
template <typename T1, typename T2, typename T3, typename T4>
void runKernel(void (*fun)(T1, T2, T3, T4), int Blocks, T1 arg1, T2 arg2, T3 arg3, T4 arg4) { 
    fun <<<Blocks, THREADS_IN_BLOCK >>> (arg1, arg2, arg3, arg4);
}

// strong-typed wrapper for a kernel with 3 arguments
template <typename T1, typename T2, typename T3>
void runKernel(void (*fun)(T1, T2, T3), int Blocks, T1 arg1, T2 arg2, T3 arg3) { 
    fun <<<Blocks, THREADS_IN_BLOCK >>> (arg1, arg2, arg3);
}

// ...

// the one-argument fun cannot have implementation here
void runKernel(void (*fun)(), int Blocks);

in a .cu file (you will get a syntax error here, but do u ever need a parameter-less kernel function? if not, this and a respective header can be deleted):

void runKernel(void (*fun)(), int Blocks) { 
    fun <<<Blocks, THREADS_IN_BLOCK >>> ();
}

usage in a .cpp file:

runKernel(kernelFunctionName, arg1, arg2, arg3);
// for example runKernel(expKernel, B, output, input, size);
0

精彩评论

暂无评论...
验证码 换一张
取 消