In the middle of a c++ code, eclipse, I need to compute the multiply of matrices A and B, with the size of 2400*3600 (so the dimensions are not the same). The matrices are stored in float two dimensional arrays.They are not sparse, no limitations.
Each multiply takes so long (several minutes), and I seriously need to reduce it, because I have a loop which repeats 50 million times. and each time a new A and B should be multiplied. Any sort of recommendation is welcomed to reduce the time complexity. (even changing the structure of storing the data, if you think that might help). For example, what if I store the data into one dimensional arrays? Or using vectors instead of arrays?
In one particular case, the开发者_开发知识库 first column is always 1, and the values are either 1, -1, or zero. Any idea for this case?
In other cases, the values can be any thing. ** one of these multiplications is X multiplied by its transposed. Is there any recommendation on this specific one?I wouldn't fool around trying to write my own: Google for LAPACK or BLAS, two time-tested packages for numerical computing, both optimized to the Nth degree. Both have C APIs you can use.
It will definitely help to store your second matrix transposed, so that columns match up with cache lines instead of rows. The difference in access time between L2 cache and main memory is a factor of 10 or so.
You might give Eigen a try.
If you're talking about millions of multiplications, the first thing I'd do is turn to something like CUDA or DirectCompute to off-load the work to the GPU, which is way better suited to this kind of stuff. That's what MATLAB did, even if the GPU acceleration is optional.
There are a myriad of examples of GPU-accelerated matrix multiplication out there so your work shouldn't be too hard.
精彩评论