I would like to upload two images to the GPU memory, and I'm inte开发者_运维知识库rested how fast I can do this?
In fact - will it be faster to compare two bitmaps in RAM with CPU, or upload them to GPU and use GPU parallelism to do it?
If you run the CUDA device bandwidth sample, you'll get a benchmark for the upload speed.
Assuming DDR3 tri-channel 1600MHz RAM, you'll get something like 38 GB/s memory bandwidth.
Take a typical midrange card like a GTX460 and you'll get something like 84 GB/s memory bandwidth. Note that you'll have to make a hop across the bus which is something like 8GB/s theoretical, ~5.5 in practice for a PCI-E2.0 x16 link.
Note that kotlinski's answer isn't quite correct. You'll can do compared in parallel and then do a parallel reduction in which case, the bigger GPU device bandwidth can work win out eventually.
I think the answer is likely to be: a loss to upload to GPU and do comparison once. Possible gain if comparison is made multiple times (kept and modified on the GPU, for example).
Edit:
The multiple times comparison refers to if you modified the images on the GPU memory in situ. Thus, it would merit another comparison (caching doesn't cut it), while not incurring the penalty of another copy across the bus.
Since memory access is the bottleneck here, it is extremely likely that it is faster to just do it in CPU. Making it run in parallel is not likely to give you anything, memory access is essentially a serial operation.
The answer to this question is highly debatable and depends entirely on you systems configuration. This means that you'll have to do the benchmarks yourself. Factors that could influence your situation:
- Speed of your RAM
- Speed of the GPU Bus
- Whether or not you have shared RAM between GPU & CPU
However, I do think that in the general case (eg. with busspeeds in the order of GB/s) it's faster to upload the images to the GPU and do the difference comparison there.
精彩评论