NVIDIA CUDA 4.0, page-locking a memory with runtime API_问答_开发者

NVIDIA CUDA 4.0, page-locking a memory with runtime API

开发者 https://www.devze.com 2023-03-01 13:57 出处：网络

NVIDIA CUDA 4.0 (RC2 is assumed here) offers the nice feature o开发者_JAVA百科f page-locking a memory range that was allocated before via the \"normal\" malloc function. This can be done using the dri

相关专题：gpu nvidia

NVIDIA CUDA 4.0 (RC2 is assumed here) offers the nice feature o开发者_JAVA百科f page-locking a memory range that was allocated before via the "normal" malloc function. This can be done using the driver API function:

CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);

Now, the development of the project was done so far using the runtime API. Unfortunately it seems that the runtime API does not offer a function like cuMemHostRegister. I really would like to avoid mixing driver and runtime API calls.

Does anyone know how to page-lock memory that was prior allocated using standard malloc ? Standard libc functions should not be used, since the page-locking is carried out for staging the memory for a fast transfer to the GPU, so I really want to stick to the "CUDA"-way.

Frank

The 4.0 runtime API offers cudaHostRegister(), which does exactly what you are asking about. Be aware that the memory allocation you lock must be host page aligned, so you probably should use either mmap() or posix_memalign() (or one of its relatives) to allocate the memory. Passing cudaHostRegister() an allocation of arbitrary size from standard malloc() will probably fail with an invalid argument error.