C# P/Invoke on CUDA DLL eventually causes AccessViolationException_问答_开发者

C# P/Invoke on CUDA DLL eventually causes AccessViolationException

开发者 https://www.devze.com 2023-01-15 10:00 出处：网络

This is driving me crazy. I\'ve looked all over, but I\'m not sure I understand exactly what\'s causing this error.

This is driving me crazy. I've looked all over, but I'm not sure I understand exactly what's causing this error.

I'm making a call to a DLL (that I've coded as a separate project) which runs a CUDA kernel on some data I'm using. Although, I suspect the issue isn't being caused by CUDA, since the code has been tested and works at least once, and usually 64-100 times before causing an AccessViolationException.

The issue is, I'm passing in three public static arrays:

public static float[] neuronInputs;
public static float[] connectionOutputs;
public static int[] calcOrder;

The data from neuronInputs gets copied onto the GPU, operated on, then copied back to connectionOutputs (calcOrder is only read, but not written). I perform a bunch of operations using the connectionOutputs array. Then I write over the neuronInputs array, and send it back to the GPU. Repeating until it fails. And it always fails.

I'm calling this function:

[DllImport("CUDANeural.dll")] 
 static extern void GenerateSubstrateConnections(
 [In, Out]    [MarshalAs(UnmanagedType.LPArray)]  float[] neuronInputs,
 [In, Out] [MarshalAs(UnmanagedType.LPArray)] int[] calcOrder,
 [In, Out]      [MarshalAs(UnmanagedType.LPArray)] float[] outWeights
    );

I only allocate the memory for the three arrays once, and I allocate a large chunk for each. I've tested it on the managed side, and there is no way I would be indexing outside of the arrays inside the CUDA code.

I guess my question is, what is causing this AccessViolationException? Assuming it isn't the CUDA code.

E开发者_开发问答DIT: Here's the call from the unmanaged side

extern "C" __declspec(dllexport) void GenerateSubstrateConnections(float* neuronInputs, int* calcOrder, float* outWeights);

It seems I might have been wrong about the CUDA side of programming. I've added in an cudaExitThread() call at the end of my call to the GenerateSubstrateConnections and this has seemed to correct the issue. However, for clarification, I'm calling a different function:

[DllImport("CUDANeural.dll")]
static extern void DebugSubstrateConnections(
[In, Out]     IntPtr neuronInputs,
[In, Out]  IntPtr calcOrder,
[In, Out]      IntPtr outWeights
);

And before I call GenerateSubstrateConnections in managed code I pin the GCHandles

 SubstrateDescription.inputHandle = GCHandle.Alloc(SubstrateDescription.neuronInputs, GCHandleType.Pinned);
 SubstrateDescription.connectionHandle = GCHandle.Alloc(SubstrateDescription.outputConnections, GCHandleType.Pinned);
calcHandle = GCHandle.Alloc(calcOrder, GCHandleType.Pinned);

Then call

GenerateSubstrateConnections(
SubstrateDescription.inputHandle.AddrOfPinnedObject(), 
calcHandle.AddrOfPinnedObject(),
SubstrateDescription.connectionHandle.AddrOfPinnedObject());

I'm not entirely sure if this is necessary, but I know that it works (currently). Thank you for all the comments, they helped me squeeze out the issue.

Maybe a thread safety issue. Since you are using static memory, you should be locking the object, or using some other synchronization option unless you are absolutely sure that it is single threaded.

I am not sure even you can do a simple pInvoke on CUDA Functions as they are not running on the main processor. Best option to directly use native CUDA API might be to use C++/CLI. And nVidia just released a support package for that. Other simpler options include using OPENCL which has the .Net library available called OpenTK which provides Managed wrappers for most uses.