开发者

How to use the `prepare` function from PyCUDA

开发者 https://www.devze.com 2023-03-26 07:08 出处:网络
I have problems passing the right parameters to the prepare function (and to the prepared_call) to allocate of shared memory in PyCUDA. I understand the error message in this way, that one of the vari

I have problems passing the right parameters to the prepare function (and to the prepared_call) to allocate of shared memory in PyCUDA. I understand the error message in this way, that one of the variables I pass to PyCUDA is a long instead of what I intended float32. But I cannot see, where the variable comes from.

Furthermore does it seem to me, that the official example and the documentation of prepare contradict each other regarding if block needs to be None or not.

from pycuda import driver, compiler, gpuarray, tools
import pycuda.autoinit
import numpy as np

kernel_code ="""
__device__ void loadVector(float *target, float* so开发者_如何学Curce, int dimensions )
{
    for( int i = 0; i < dimensions; i++ ) target[i] = source[i];
}
__global__ void kernel(float* data, int dimensions, float* debug)
{
    extern __shared__ float mean[];
    if(threadIdx.x == 0) loadVector( mean, &data[0], dimensions );
    debug[threadIdx.x]=  mean[threadIdx.x];
}
"""

dimensions = 12
np.random.seed(23)
data = np.random.randn(dimensions).astype(np.float32)
data_gpu = gpuarray.to_gpu(data)
debug = gpuarray.zeros(dimensions, dtype=np.float32)

mod = compiler.SourceModule(kernel_code)
kernel = mod.get_function("kernel")
kernel.prepare("PiP",block = (dimensions, 1, 1),shared=data.size)
grid = (1,1)
kernel.prepared_call(grid,data_gpu,dimensions,debug)
print debug.get()

Output

Traceback (most recent call last):
File "shared_memory_minimal_example.py", line 28, in <module>
kernel.prepared_call(grid,data_gpu,dimensions,debug)
File "/usr/local/lib/python2.6/dist-packages/pycuda-0.94.2-py2.6-linux-x86_64.egg/pycuda/driver.py", line 230, in function_prepared_call
func.param_setv(0, pack(func.arg_format, *args))
pycuda._pvt_struct.error: cannot convert argument to long


I came across this same problem and it took my a while to work out the answer so here goes. The cause of the error message is that data_gpu is a GPUArray instance, i.e. you made it with

data_gpu = gpuarray.to_gpu(data)

To pass it to prepared_call you need to do data_gpu.gpudata to get the associated DeviceAllocation instance (i.e. effectively the pointer to the device memory location).

Also, passing a block argument to prepare is now deprecated - so a correct invocation would be something like this:

data_gpu = gpuarray.to_gpu(data)
func.prepare( "P" )
grid = (1,1)
block = (1,1,1)
func.prepared_call( grid, block, data_gpu.gpudata )
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号