I have a matrix M
thats's 16384 x 81
. I want to compute M * M.t
(the result will be 16384x16384
).
My question is: could somebody please explain the running time differences?
Using OpenCV in C++ the following code takes 18 seconds
#include <cv.h>
#include <cstdio>
using namespace cv;
int main(void) {
Mat m(16384, 81, CV_32FC1);
randu(m, Scalar(0), Scalar(1));
int64 tic = getTickCount();
Mat m2 = m * m.t();
printf("%f", (getTickCount() - tic) / getTickFrequency());
}
In Pytho开发者_如何学JAVAn the following code takes only 0.9 seconds 18.8 seconds (see comment below)
import numpy as np
from time import time
m = np.random.rand(16384, 81)
tic = time()
result = np.dot(m, m.T)
print (time() - tic)
In MATLAB the following code takes 17.7 seconds
m = rand(16384, 81);
tic;
result = m * m';
toc;
My only guess would have been that it's a memory issue, and that somehow Python is able to avoid swap space. When I watch top
, however, I do not see my C++ application
using all the memory, and I had expected that C++
would win the day. Thanks for any insights.
Edit
After revising my examples to time only the operation, the code now takes 18 seconds with Python, also. I'm really not sure what's going on, but if there's enough memory, they all seem to perform the same now.
Here are timings if the number of rows is 8192: C++: 4.5 seconds Python: 4.2 seconds Matlab: 1.8 seconds
What CPU are you running on? For modern x86 and x64 chips with dynamic clocking, getTickCount
and getTickFrequency
cannot be trusted.
18 seconds is long enough to get acceptable precision from the standard OS functions based on the timer interrupt.
And what BLAS are you using with OpenCV? MatLab installs some highly optimized ones, IIRC even detecting your CPU and loading either Intel's or AMD's math library appropriately.
精彩评论