Verbeia opened up a rather interesting discussion on the performance of the functional programming style in Mathematica. It can be found here: What is the most efficient way to construct large block matrices in Mathematica?)
I'm working on a problem, and after doing some timing of my code one particularly time consuming portion is where I calculate entries of a matrix through a recurrence relation:
c = Table[0, {2L+1}, {2L+1}];
c[[1, 1]] = 1;
Do[c[[i, i]] = e[[i - 1]] c[[i - 1, i - 1]], {i, 2, 2 L + 1}];
Do[c[[i, 1]] = (1 - e[[i - 1]]) c[[i - 1, 1]], {i, 2, 2 L + 1}];
Do[c[[i, j]] = (1 - e[[i - 1]]) c[[i - 1, j]] +
e[[i - 1]] c[[i - 1, j - 1]], {i, 2, 2 L + 1}, {j, 2, i - 1}];
Where e
is some externally defined list. Is there any way I could write this in a more efficient manner? I can't seem to find any obvious way of using the built in functions to accomplish this in a more idiomatic and efficient way.
I realize I can only do so much, since this code is O(n^2)
, but I have a series of matrix multiplications (About 6 in all) that, combined, take less time to run than this statement. Anything I can do to speed this up even slightly would make an appreciable difference in run times for me.
Update:
In line with what acl recommended, I tried using Compile
to speed up my expressions. For a relatively small L = 600
, I get 3.81 seconds on the naive Do[...]
, 1.54 seconds for plain old Compile
, and 0.033 seconds for Compile[..., Compilati开发者_运维百科onTarget->"C"]
.
For a more realistic size of L = 1200
, the timings become 16.68, 0.605, and 0.132 for Do
, Compile
and Compile[.., CompilationTarget->"C"]
respectively. I'm able to achieve the same 2 orders of magnitude speedup that acl mentioned in his post.
Try Compile
. Here I define 3 functions: f
as you defined it, fc
compiled (to some sort of bytecode) and fcc
compiled to C (look up the documentation as to how to examine the generated code).
First, make mma tell us if something can't be compiled:
SetSystemOptions["CompileOptions"->"CompileReportExternal"->True]
then the functions:
ClearAll[f];
f=Function[{ell,e},
Module[{c=Table[0,{2ell+1},{2ell+1}]},
c[[1,1]]=1;
Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
{i,2,2 ell+1},{j,2,i-1}];
c
]
];
ClearAll[fc];
fc=Compile[{{ell,_Integer},{e,_Integer,1}},
Module[{c},
c=Table[0,{2ell+1},{2ell+1}];
c[[1,1]]=1;
Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
{i,2,2 ell+1},{j,2,i-1}];
c
]
];
ClearAll[fcc];
fcc=Compile[{{ell,_Integer},{e,_Integer,1}},
Module[{c},
c=Table[0,{2ell+1},{2ell+1}];
c[[1,1]]=1;
Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
{i,2,2 ell+1},{j,2,i-1}];
c
],
CompilationTarget->"C",
RuntimeOptions->"Speed"
];
no errors, so it's OK. And now test (these on a macbook with a 2.4GHz core 2 duo running on battery):
ell=400;
e=RandomInteger[{0,1},2*ell];
f[ell,e];//Timing
fc[ell,e];//Timing
fcc[ell,e];//Timing
giving
{2.60925, Null}
{0.092022, Null}
{0.022709, Null}
so the version compiled to C is two orders of magnitude faster here.
If you change the types and get compilation errors, ask.
EDIT: If e
contains reals, try
ClearAll[fc];
fc=Compile[{{ell,_Integer},{e,_Real,1}},
Module[{c},
c=Table[0.,{2ell+1},{2ell+1}];
c[[1,1]]=1;
Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
{i,2,2 ell+1},{j,2,i-1}];
c
]
];
ClearAll[fcc];
fcc=Compile[{{ell,_Integer},{e,_Real,1}},
Module[{c},
c=Table[0.,{2ell+1},{2ell+1}];
c[[1,1]]=1;
Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
{i,2,2 ell+1},{j,2,i-1}];
c
],
CompilationTarget\[Rule]"C",
RuntimeOptions\[Rule]"Speed"
];
instead.
One can get a feel for how this works by saying
Needs["CompiledFunctionTools`"]
CompilePrint[fc]
and obtaining
2 arguments
18 Integer registers
6 Real registers
3 Tensor registers
Underflow checking off
Overflow checking off
Integer overflow checking on
RuntimeAttributes -> {}
I0 = A1
T(R1)0 = A2
I12 = 0
I1 = 2
I3 = 1
I14 = -1
R0 = 0.
Result = T(R2)2
1 I11 = I1 * I0
2 I11 = I11 + I3
3 I7 = I1 * I0
4 I7 = I7 + I3
5 I17 = I12
6 T(R2)2 = Table[ I11, I7]
7 I15 = I12
8 goto 13
9 I16 = I12
10 goto 12
11 Element[ T(R2)2, I17] = R0
12 if[ ++ I16 < I7] goto 11
13 if[ ++ I15 < I11] goto 9
14 R1 = I3
15 Part[ T(R2)2, I3, I3] = R1
16 I4 = I1 * I0
17 I4 = I4 + I3
18 I5 = I3
19 goto 26
20 I10 = I5 + I14
21 I8 = I10
22 R4 = Part[ T(R1)0, I8]
23 R5 = Part[ T(R2)2, I8, I8]
24 R4 = R4 * R5
25 Part[ T(R2)2, I5, I5] = R4
26 if[ ++ I5 < I4] goto 20
27 I4 = I1 * I0
28 I4 = I4 + I3
29 I5 = I3
30 goto 40
31 I10 = I5 + I14
32 I13 = I10
33 R4 = Part[ T(R1)0, I13]
34 R5 = - R4
35 R4 = I3
36 R4 = R4 + R5
37 R5 = Part[ T(R2)2, I13, I3]
38 R4 = R4 * R5
39 Part[ T(R2)2, I5, I3] = R4
40 if[ ++ I5 < I4] goto 31
41 I4 = I1 * I0
42 I4 = I4 + I3
43 I5 = I3
44 goto 63
45 I6 = I5 + I14
46 I17 = I3
47 goto 62
48 I16 = I5 + I14
49 I9 = I16
50 R4 = Part[ T(R1)0, I9]
51 R2 = R4
52 R4 = - R2
53 R5 = I3
54 R5 = R5 + R4
55 R4 = Part[ T(R2)2, I9, I17]
56 R5 = R5 * R4
57 I16 = I17 + I14
58 R4 = Part[ T(R2)2, I9, I16]
59 R3 = R2 * R4
60 R5 = R5 + R3
61 Part[ T(R2)2, I5, I17] = R5
62 if[ ++ I17 < I6] goto 48
63 if[ ++ I5 < I4] goto 45
64 Return
with the names of the registers indicating their type etc. You can also look at the generated C code if you use the "C" option (but that is a bit harder to read).
Given that e
is already defined (as mentioned in your comment), you can get the first two steps like this:
firsttwosteps = DiagonalMatrix[FoldList[#1 * #2&, 1, e]] +
ArrayFlatten[{{ {{0}} , {Table[0,{2L}]} },
{Transpose[{Rest@ FoldList[(1-#2)#1 &,1,e]}], Table[0,{2L},{2L}]}}]
That is, you set up the diagonal and the first column as required in two matrices and add them. This works because the only dependency of the second step on the first step is the element at position {1,1}.
The third step is a little trickier. If you want a purely functional solution, then it's a case of two FoldList
s. And you might find it easier to build firsttwosteps
in transposed form, and then transpose the upper-triangular result into a lower-triangular final result.
firsttwostepsT = DiagonalMatrix[FoldList[#1 * #2&, 1, e]] +
ArrayFlatten[{{ {{0}} ,{Rest@ FoldList[(1-#2)#1 &,1,e]} },
{ Table[0,{2L}, {1}], Table[0,{2L},{2L}]}}]
And then:
Transpose @ FoldList[With[{cim1 = #1},
Most@FoldList[(1 - #2[[1]]) #1 + #2[[1]] #2[[2]] &, 0.,
Transpose[{Join[e, {0}], cim1}]]] &, First@firsttwostepsT, Join[e, {0}]]
In the check I did, this preserved the diagonal of firsttwostepsT
and produces an upper triangular matrix before it's transposed.
The reason your code is slow is that you are repeated redefining the same entire matrix by defining individual parts. As a general principle, you almost never need Do
loops and ReplacePart
type constructs in Mathematica. There is almost always a better way.
精彩评论