I have two 3-dimensional arrays, the first two dimensions of which represent matrices and the last one counts through a parameterspace, as a simple example take
A = repmat([1,2; 3,4], [1 1 4]);
(but assume A(:,:,j)
is different for each j
). How can one easily perform a per-j
matrix multiplication 开发者_C百科of two such matrix-arrays A
and B
?
C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end
certainly does the job, but if the third dimension is more like 1e3 elements this is very slow since it doesn't use MATLAB's vectorization. So, is there a faster way?
I did some timing tests now, the fastest way for 2x2xN turns out to be calculating the matrix elements:
C = A;
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
In the general case it turns out the for loop is actually the fastest (don't forget to pre-allocate C though!).
Should one already have the result as cell-array of matrices though, using cellfun is the fastest choice, it is also faster than looping over the cell elements:
C = cellfun(@mtimes, A, B, 'UniformOutput', false);
However, having to call num2cell first (Ac = num2cell(A, [1 2])
) and cell2mat
for the 3d-array case wastes too much time.
Here's some timing I did for a random set of 2 x 2 x 1e4:
array-for: 0.057112
arrayfun : 0.14206
num2cell : 0.079468
cell-for : 0.033173
cellfun : 0.025223
cell2mat : 0.010213
explicit : 0.0021338
Explicit refers to using direct calculation of the 2 x 2 matrix elements, see bellow.
The result is similar for new random arrays, cellfun
is the fastest if no num2cell
is required before and there is no restriction to 2x2xN. For general 3d-arrays looping over the third dimension is indeed the fastest choice already. Here's the timing code:
n = 2;
m = 2;
l = 1e4;
A = rand(n,m,l);
B = rand(m,n,l);
% naive for-loop:
tic
%Cf = nan(n,n,l);
Cf = A;
for jl = 1:l
Cf(:,:,jl) = A(:,:,jl) * B(:,:,jl);
end;
disp([' array-for: ' num2str(toc)]);
% using arrayfun:
tic
Ca = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false);
Ca = cat(3,Ca{:});
disp([' arrayfun : ' num2str(toc)]);
tic
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
disp([' num2cell : ' num2str(toc)]);
% cell for-loop:
tic
Cfc = Ac;
for jl = 1:l
Cfc{jl} = Ac{jl} * Bc{jl};
end;
disp([' cell-for : ' num2str(toc)]);
% using cellfun:
tic
Cc = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
disp([' cellfun : ' num2str(toc)]);
tic
Cc = cell2mat(Cc);
disp([' cell2mat : ' num2str(toc)]);
tic
Cm = A;
Cm(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
Cm(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
Cm(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
Cm(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
disp([' explicit : ' num2str(toc)]);
disp(' ');
Here is my benchmark test comparing the methods mentioned in @TobiasKienzler answer. I am using the TIMEIT function to get more accurate timings.
function [t,v] = matrixMultTest()
n = 2; m = 2; p = 1e5;
A = rand(n,m,p);
B = rand(m,n,p);
%# time functions
t = zeros(5,1);
t(1) = timeit( @() func1(A,B,n,m,p) );
t(2) = timeit( @() func2(A,B,n,m,p) );
t(3) = timeit( @() func3(A,B,n,m,p) );
t(4) = timeit( @() func4(A,B,n,m,p) );
t(5) = timeit( @() func5(A,B,n,m,p) );
%# check the results
v = cell(5,1);
v{1} = func1(A,B,n,m,p);
v{2} = func2(A,B,n,m,p);
v{3} = func3(A,B,n,m,p);
v{4} = func4(A,B,n,m,p);
v{5} = func5(A,B,n,m,p);
assert( isequal(v{:}) )
end
%# simple FOR-loop
function C = func1(A,B,n,m,p)
C = zeros(n,n,p);
for k=1:p
C(:,:,k) = A(:,:,k) * B(:,:,k);
end
end
%# ARRAYFUN
function C = func2(A,B,n,m,p)
C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:p, 'UniformOutput',false);
C = cat(3, C{:});
end
%# NUM2CELL/FOR-loop/CELL2MAT
function C = func3(A,B,n,m,p)
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
C = cell(1,1,p);
for k=1:p
C{k} = Ac{k} * Bc{k};
end;
C = cell2mat(C);
end
%# NUM2CELL/CELLFUN/CELL2MAT
function C = func4(A,B,n,m,p)
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
C = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
C = cell2mat(C);
end
%# Loop Unrolling
function C = func5(A,B,n,m,p)
C = zeros(n,n,p);
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
end
The results:
>> [t,v] = matrixMultTest();
>> t
t =
0.63633 # FOR-loop
1.5902 # ARRAYFUN
1.1257 # NUM2CELL/FOR-loop/CELL2MAT
1.0759 # NUM2CELL/CELLFUN/CELL2MAT
0.05712 # Loop Unrolling
As I explained in the comments, a simple FOR-loop is the best solution (short of loop unwinding in the last case, which is only feasible for these small 2-by-2 matrices).
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
- It is easy to use.
- Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
- It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
- It uses C compiler and multi-thread computation for speed up.
For this problem, you just need to write this command:
C=mmx('mul',A,B);
I added the following function to @Amro's answer
%# mmx toolbox
function C=func6(A,B,n,m,p)
C=mmx('mul',A,B);
end
I got this result for n=2,m=2,p=1e5
:
1.6571 # FOR-loop
4.3110 # ARRAYFUN
3.3731 # NUM2CELL/FOR-loop/CELL2MAT
2.9820 # NUM2CELL/CELLFUN/CELL2MAT
0.0244 # Loop Unrolling
0.0221 # MMX toolbox <===================
I used @Amro's code to run the benchmark.
One technique would be to create a 2Nx2N sparse matrix and embed on the diagonal the 2x2 matrices, for both A and B. Do the product with sparse matrices and take the result with slightly clever indexing and reshape it to 2x2xN.
But I doubt this will be faster than simple looping.
An even faster method, in my experience, is to use dot multiplication and summation over the three-dimensional matrix. The following function, function z_matmultiply(A,B) multiplies two three dimensional matrices which have the same depth. Dot multiplication is done in a manner that is as parallel as possible, thus you might want to check the speed of this function and compare it to others over a large number of repetitions.
function C = z_matmultiply(A,B)
[ma,na,oa] = size(A);
[mb,nb,ob] = size(B);
%preallocate the output as we will do a loop soon
C = zeros(ma,nb,oa);
%error message if the dimensions are not appropriate
if na ~= mb || oa ~= ob
fprintf('\n z_matmultiply warning: Matrix Dimmensions Inconsistent \n')
else
% if statement minimizes for loops by looping the smallest matrix dimension
if ma > nb
for j = 1:nb
Bp(j,:,:) = B(:,j,:);
C(:,j,:) = sum(A.*repmat(Bp(j,:,:),[ma,1]),2);
end
else
for i = 1:ma
Ap(:,i,:) = A(i,:,:);
C(i,:,:) = sum(repmat(Ap(:,i,:),[1,nb]).*B,1);
end
end
end
精彩评论