I am currently developing a CUDA application开发者_开发技巧 that will most certainly be deployed on a GPU much better than mine. Given another GPU model, how can I estimate how much faster my algorithm will run on it?
You're going to have a difficult time, for a number of reasons:
Clock rate and memory speed only have a weak relationship to code speed, because there is a lot more going on under the hood (e.g., thread context switching) that gets improved/changed for almost all new hardware.
Caches have been added to new hardware (e.g., Fermi) and unless you model cache hit/miss rates, you'll have a tough time predicting how this will affect the speed.
Floating point performance in general is very dependent on model (e.g.: Tesla C2050 has better performance than the "top of the line" GTX-480).
Register usage per device can change for different devices, and this can also affect performance; occupancy will be affected in many cases.
Performance can be improved by targeting specific hardware, so even if your algorithm is perfect for your GPU, it could be better if you optimize it for the new hardware.
Now, that said, you can probably make some predictions if you run your app through one of the profilers (such as the NVIDIA Compute Profiler), and you look at your occupancy and your SM utilization. If your GPU has 2 SMs and the one you will eventually run on has 16 SMs, then you will almost certainly see an improvement, but not specifically because of that.
So, unfortunately, it isn't easy to make the type of predictions you want. If you're writing something open source, you could post the code and ask others to test it with newer hardware, but that isn't always an option.
This can be very hard to predict for certain hardware changes and trivial for others. Highlight the differences between the two cards you're considering.
For example, the change could be as trivial as -- if I had purchased one of those EVGA water-cooled behemoths, how much better would it perform over a standard GTX 580? This is just an exercise in computing the differences in the limiting clock speed (memory or gpu clock). I've also encountered this question when wondering if I should overclock my card.
If you're going to a similar architecture, GTX 580 to Tesla C2070, you can make a similar case of differences in clock speeds, but you have to be careful of the single/double precision issue.
If you're doing something much more drastic, say going from a mobile card -- GTX 240M -- to a top of the line card -- Tesla C2070 -- then you may not get any performance improvement at all.
Note: Chris is very correct in his answer, but I wanted to stress this caution because I envision this common work path:
One says to the boss:
- So I've heard about this CUDA thing... I think it could make function
X
much more efficient. - Boss says you can have 0.05% of work time to test out CUDA -- hey we already have this mobile card, use that.
- One year later... So CUDA could get us a three fold speedup. Could I buy a better card to test it out? (A GTX 580 only costs $400 -- less than that intern fiasco...)
- You spend the $$, buy the card, and your CUDA code runs slower.
- Your boss is now upset. You've wasted time and money.
So what happened? Developing on an old card, think 8800, 9800, or even the mobile GTX 2XX with like 30 cores, leads one to optimize and design your algorithm in a very different way from how you would to efficiently utilize a card with 512 cores. Caveat Emptor You get what you pay for -- those awesome cards are awesome -- but your code may not run faster.
Warning issued, what's the walk away message? When you get that nicer card, be sure to invest time in tuning, testing, and possibly redesigning your algorithm from the ground up.
OK, so that said, rule of thumb? GPUs get twice as fast every six months. So if you're moving from a card that's two years old to a card that's top of the line, claim to your boss that it will run between 4 to 8 times faster (and if you get the full 16-fold improvement, bravo!!)
精彩评论