NVIDIA CUDA 4.0 (RC2 is assumed here) offers the nice feature o开发者_JAVA百科f page-locking a memory range that was allocated before via the "normal" malloc function. This can be done using the driver API function:
CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);
Now, the development of the project was done so far using the runtime API. Unfortunately it seems that the runtime API does not offer a function like cuMemHostRegister. I really would like to avoid mixing driver and runtime API calls.
Does anyone know how to page-lock memory that was prior allocated using standard malloc ? Standard libc functions should not be used, since the page-locking is carried out for staging the memory for a fast transfer to the GPU, so I really want to stick to the "CUDA"-way.
Frank
The 4.0 runtime API offers cudaHostRegister()
, which does exactly what you are asking about. Be aware that the memory allocation you lock must be host page aligned, so you probably should use either mmap()
or posix_memalign()
(or one of its relatives) to allocate the memory. Passing cudaHostRegister()
an allocation of arbitrary size from standard malloc()
will probably fail with an invalid argument error.
精彩评论