Are there any restrictions for allocating device memory despite not exceeding available memory? I get following error after trying to allocate 64MB:
cudaSafeCall() Runtime API error : out of memory.
However, according to cuMemGetInfo there is over 200MB left.
Here is scenario:
size_t size = 4096 * 4096 * sizeof (float);
cuMemGetInfo(&fr, &ttl); // fr indicates 284 MB
cutilSafeCall(cudaMalloc((void**) &tmp, size));
p1 = tmp;
cuMemGetInfo(&fr, &ttl); // fr indicates 220 MB
cutilSafeCall(cudaMalloc((void**) &tmp, size)); // th开发者_C百科is fails !!!
p2 = tmp;
What am I missing?
I am using:
Cuda compilation tools, release 3.2, V0.2.1221
NVidia Driver 260.19.26
Linux(Slackware) x86
Update:
This behavior is quite non-deterministic. From time to time above case succeeds and I get correct results, without any error.
As Thomas pointed out the problem is memory fragmentation. (Confirmed by experiment. I didn't find reliable source to link)
You seem to be reallocating the same memory, as you are reusing tmp. If you are used to Object Oriented code, you are likely mistaking pointer with references to objects.
The following code should give you the same result:
size_t size = 4096 * 4096 * sizeof(float);
float* p1;
float* p2;
cutilSafeCall(cudaMalloc((void**) &p1, size));
cutilSafeCall(cudaMalloc((void**) &p2, size));
You are using the same memory pointer to allocate memory twice. I know that you are using p1 to backup the first memory allocation address, but you are forgetting to clear tmp after that. Maybe cudaMalloc()
is failing because of it. It's just a wild guess.
size_t size = 4096 * 4096 * sizeof (float);
cuMemGetInfo(&fr, &ttl);
cutilSafeCall(cudaMalloc((void**) &tmp, size));
p1 = tmp;
tmp = 0; // or NULL to clear the pointer
cuMemGetInfo(&fr, &ttl);
cutilSafeCall(cudaMalloc((void**) &tmp, size));
p2 = tmp;
精彩评论