开发者

Using pthreads with CUDA - design questions

开发者 https://www.devze.com 2023-03-18 04:25 出处:网络
I am writing some code that requires some disk I/O, and invoking a library that I wrote that does some computation and GPU work, and then more disk I/O to write the results back to a file.

I am writing some code that requires some disk I/O, and invoking a library that I wrote that does some computation and GPU work, and then more disk I/O to write the results back to a file.

I would like to create this as multi-threaded code, because the files are quite large. I want to be able to read in portion of the file, send it to the GPU library, and write a portion back to a file. The disk I/O involved is quite large (like 10GB), and the computation is fairly quick on the GPU.

My question is more of a design question. Should I use separate threads to pre-load data that goes to the GPU library, and only have the main thread actually execute the calls to the GPU library, and then send the resulting data to other threads to be written back out to disk, or should I go ahead and have all of the separate threads each do their own part - grab a chucnk of data, execute on the GPU, and write to disk, and then go get the next chunk of data?

I am using CUDA for my GPU library. Is cuda smart enough to not try to run two kernels on the开发者_JAVA百科 GPU at once? I guess I will have to do the management manually to ensure that two threads dont try to add more data to the GPU than it has space?

Any good resources on the subject of multithreading and CUDA being used in combination is appreciated.


Threads will not help with disk I/O. Generally people tend to solve blocking problems by creating tons of threads. In fact, that only makes things worse. What you have to do is to use asynchronous I/O and not block on write (and read). You can use some generic solutions like libevent or Asio for this or work with lower-level API available on your platform. On Linux, AIO seems to be the best for files, but I haven't tried that yet. Hope it helps.


I encountered this situation with large files in my research work.

As far as I remember there is no much gain in threading the disk I/O work because is very slow compared to the GPU I/O.

The strategy I used was to read synchronously from disk and to load data and execute asynchronously on the GPU.

Something like:

read from disk
loop:
 async_load_to_gpu
 async_execute
 push_event
 read from disk
 check event complete or read more data from disk
0

精彩评论

暂无评论...
验证码 换一张
取 消