I have a particle system with X particles. Each particle tests for collision with other particles. This gives X*X = X^2开发者_JAVA技巧 collision tests per frame. For 60f/s, this corresponds to 60*X^2 collision detection per second.
What is the best technological approach for these intensive calculations? Should I use F#, C, C++ or C#, or something else?
The following are constraints
- The code is written in C# with the latest XNA
- Multi-threaded may be considered
- No special algorithm that tests the collision with the nearest neighbors or that reduces the problem
The last constraint may be strange, so let me explain. Regardless constraint 3, given a problem with enormous computational requirement what would be the best approach to solve the problem. An algorithm reduces the problem; still the same algorithm may behave different depending on technology. Consider pros and cons of CLR vs native C.
The simple answer is "measure it". But take a look at this graph (that I borrowed from this question - which is worth your reading).
C++ is maybe 10% faster than MS's C# implementation (for this particular calculation) and faster still against Mono's C# implementation. But in real world terms, C++ is not all that much faster than C#.
If you're doing hard-core number crunching, you will want to use the SIMD/SSE unit of your CPU. This is something that C# does not normally support - but Mono is adding support for through Mono.Simd
. You can see from the graph that using the SIMD unit gives a significant performance boost to both languages.
(It's worth noting that while C++ is still "faster" than C#, the choice of language has only a small effect on performance, compared to the choice of what hardware to use. As was mentioned in the comments - your choice of algorithm will have by far the greatest effect.)
And finally, as Jerry Coffin mentioned in his answer, that you could also do the processing on the GPU. I imagine that it would be even faster than SIMD (but as I said - measure it). Using the GPU has the added beneift of leaving the CPU free to do other tasks. The downside is that your end-users will need a reasonable GPU.
You should probably consider doing this on the GPU using something like CUDA, OpenCL, or a DirectX compute shader.
Sweep and prune is a broad phase collision detection algorithm that may be well suited to this task. If you can make use of temporal coherency, that being from frame to frame the location differences are generally small a reduction in processing may be obtained. A good book on the subject is "real time collision detection".
For a simple speed up you could sort by one axis first and loop through checking for a collision in that axis before doing a full check... For each particle you only need to look further in the array until you find one that doesn't collide in that axis then you can move to the next one.
精彩评论