More efficient way of calculating this recurrence relation in Mathematica_问答_开发者

Verbeia opened up a rather interesting discussion on the performance of the functional programming style in Mathematica. It can be found here: What is the most efficient way to construct large block matrices in Mathematica?)

I'm working on a problem, and after doing some timing of my code one particularly time consuming portion is where I calculate entries of a matrix through a recurrence relation:

c = Table[0, {2L+1}, {2L+1}];

c[[1, 1]] = 1;
Do[c[[i, i]] = e[[i - 1]] c[[i - 1, i - 1]], {i, 2, 2 L + 1}];
Do[c[[i, 1]] = (1 - e[[i - 1]]) c[[i - 1, 1]], {i, 2, 2 L + 1}];
Do[c[[i, j]] = (1 - e[[i - 1]]) c[[i - 1, j]] + 
    e[[i - 1]] c[[i - 1, j - 1]], {i, 2, 2 L + 1}, {j, 2, i - 1}];

Where e is some externally defined list. Is there any way I could write this in a more efficient manner? I can't seem to find any obvious way of using the built in functions to accomplish this in a more idiomatic and efficient way.

I realize I can only do so much, since this code is O(n^2), but I have a series of matrix multiplications (About 6 in all) that, combined, take less time to run than this statement. Anything I can do to speed this up even slightly would make an appreciable difference in run times for me.

Update: In line with what acl recommended, I tried using Compile to speed up my expressions. For a relatively small L = 600, I get 3.81 seconds on the naive Do[...], 1.54 seconds for plain old Compile, and 0.033 seconds for Compile[..., Compilati开发者_运维百科onTarget->"C"].

For a more realistic size of L = 1200, the timings become 16.68, 0.605, and 0.132 for Do, Compile and Compile[.., CompilationTarget->"C"] respectively. I'm able to achieve the same 2 orders of magnitude speedup that acl mentioned in his post.

Try Compile. Here I define 3 functions: f as you defined it, fc compiled (to some sort of bytecode) and fcc compiled to C (look up the documentation as to how to examine the generated code).

First, make mma tell us if something can't be compiled:

SetSystemOptions["CompileOptions"->"CompileReportExternal"->True]

then the functions:

ClearAll[f];
f=Function[{ell,e},
    Module[{c=Table[0,{2ell+1},{2ell+1}]},
        c[[1,1]]=1;
        Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
        Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
        Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
            {i,2,2 ell+1},{j,2,i-1}];
        c
    ]
];


ClearAll[fc];
fc=Compile[{{ell,_Integer},{e,_Integer,1}},
    Module[{c},
        c=Table[0,{2ell+1},{2ell+1}];
        c[[1,1]]=1;
        Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
        Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
        Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
            {i,2,2 ell+1},{j,2,i-1}];
        c
    ]
];

ClearAll[fcc];
fcc=Compile[{{ell,_Integer},{e,_Integer,1}},
    Module[{c},
        c=Table[0,{2ell+1},{2ell+1}];
        c[[1,1]]=1;
        Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
        Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
        Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
            {i,2,2 ell+1},{j,2,i-1}];
        c
    ],
    CompilationTarget->"C",
    RuntimeOptions->"Speed"
];

no errors, so it's OK. And now test (these on a macbook with a 2.4GHz core 2 duo running on battery):

ell=400;
e=RandomInteger[{0,1},2*ell];
f[ell,e];//Timing
fc[ell,e];//Timing
fcc[ell,e];//Timing

giving

{2.60925, Null}
{0.092022, Null}
{0.022709, Null}

so the version compiled to C is two orders of magnitude faster here.

If you change the types and get compilation errors, ask.

EDIT: If e contains reals, try

ClearAll[fc];
fc=Compile[{{ell,_Integer},{e,_Real,1}},
    Module[{c},
        c=Table[0.,{2ell+1},{2ell+1}];
        c[[1,1]]=1;
        Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
        Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
        Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
            {i,2,2 ell+1},{j,2,i-1}];
        c
    ]
];

ClearAll[fcc];
fcc=Compile[{{ell,_Integer},{e,_Real,1}},
    Module[{c},
        c=Table[0.,{2ell+1},{2ell+1}];
        c[[1,1]]=1;
        Do[c[[i,i]]=e[[i-1]] c[[i-1,i-1]],{i,2,2 ell+1}];
        Do[c[[i,1]]=(1-e[[i-1]]) c[[i-1,1]],{i,2,2 ell+1}];
        Do[c[[i,j]]=(1-e[[i-1]]) c[[i-1,j]]+e[[i-1]] c[[i-1,j-1]],
            {i,2,2 ell+1},{j,2,i-1}];
        c
    ],
    CompilationTarget\[Rule]"C",
    RuntimeOptions\[Rule]"Speed"
];

instead.

One can get a feel for how this works by saying

Needs["CompiledFunctionTools`"]
CompilePrint[fc]

and obtaining

        2 arguments
        18 Integer registers
        6 Real registers
        3 Tensor registers
        Underflow checking off
        Overflow checking off
        Integer overflow checking on
        RuntimeAttributes -> {}

        I0 = A1
        T(R1)0 = A2
        I12 = 0
        I1 = 2
        I3 = 1
        I14 = -1
        R0 = 0.
        Result = T(R2)2

1   I11 = I1 * I0
2   I11 = I11 + I3
3   I7 = I1 * I0
4   I7 = I7 + I3
5   I17 = I12
6   T(R2)2 = Table[ I11, I7]
7   I15 = I12
8   goto 13
9   I16 = I12
10  goto 12
11  Element[ T(R2)2, I17] = R0
12  if[ ++ I16 < I7] goto 11
13  if[ ++ I15 < I11] goto 9
14  R1 = I3
15  Part[ T(R2)2, I3, I3] = R1
16  I4 = I1 * I0
17  I4 = I4 + I3
18  I5 = I3
19  goto 26
20  I10 = I5 + I14
21  I8 = I10
22  R4 = Part[ T(R1)0, I8]
23  R5 = Part[ T(R2)2, I8, I8]
24  R4 = R4 * R5
25  Part[ T(R2)2, I5, I5] = R4
26  if[ ++ I5 < I4] goto 20
27  I4 = I1 * I0
28  I4 = I4 + I3
29  I5 = I3
30  goto 40
31  I10 = I5 + I14
32  I13 = I10
33  R4 = Part[ T(R1)0, I13]
34  R5 = - R4
35  R4 = I3
36  R4 = R4 + R5
37  R5 = Part[ T(R2)2, I13, I3]
38  R4 = R4 * R5
39  Part[ T(R2)2, I5, I3] = R4
40  if[ ++ I5 < I4] goto 31
41  I4 = I1 * I0
42  I4 = I4 + I3
43  I5 = I3
44  goto 63
45  I6 = I5 + I14
46  I17 = I3
47  goto 62
48  I16 = I5 + I14
49  I9 = I16
50  R4 = Part[ T(R1)0, I9]
51  R2 = R4
52  R4 = - R2
53  R5 = I3
54  R5 = R5 + R4
55  R4 = Part[ T(R2)2, I9, I17]
56  R5 = R5 * R4
57  I16 = I17 + I14
58  R4 = Part[ T(R2)2, I9, I16]
59  R3 = R2 * R4
60  R5 = R5 + R3
61  Part[ T(R2)2, I5, I17] = R5
62  if[ ++ I17 < I6] goto 48
63  if[ ++ I5 < I4] goto 45
64  Return

with the names of the registers indicating their type etc. You can also look at the generated C code if you use the "C" option (but that is a bit harder to read).

Given that e is already defined (as mentioned in your comment), you can get the first two steps like this:

firsttwosteps = DiagonalMatrix[FoldList[#1 * #2&, 1, e]] + 
ArrayFlatten[{{ {{0}} , {Table[0,{2L}]} }, 
{Transpose[{Rest@ FoldList[(1-#2)#1 &,1,e]}], Table[0,{2L},{2L}]}}]

That is, you set up the diagonal and the first column as required in two matrices and add them. This works because the only dependency of the second step on the first step is the element at position {1,1}.

The third step is a little trickier. If you want a purely functional solution, then it's a case of two FoldLists. And you might find it easier to build firsttwosteps in transposed form, and then transpose the upper-triangular result into a lower-triangular final result.

firsttwostepsT = DiagonalMatrix[FoldList[#1 * #2&, 1, e]] + 
ArrayFlatten[{{ {{0}} ,{Rest@ FoldList[(1-#2)#1 &,1,e]}  }, 
{ Table[0,{2L}, {1}], Table[0,{2L},{2L}]}}]

And then:

Transpose @ FoldList[With[{cim1 = #1}, 
    Most@FoldList[(1 - #2[[1]]) #1 + #2[[1]] #2[[2]] &, 0., 
    Transpose[{Join[e, {0}], cim1}]]] &, First@firsttwostepsT, Join[e, {0}]]

In the check I did, this preserved the diagonal of firsttwostepsT and produces an upper triangular matrix before it's transposed.

The reason your code is slow is that you are repeated redefining the same entire matrix by defining individual parts. As a general principle, you almost never need Do loops and ReplacePart type constructs in Mathematica. There is almost always a better way.