开发者

Searching from the end of a list in Mathematica

开发者 https://www.devze.com 2023-04-05 21:50 出处:网络
Many algorithms (like the algorithm for finding the next permutation of a list in lexic开发者_开发问答ographical order) involve finding the index of the last element in a list. However, I haven\'t bee

Many algorithms (like the algorithm for finding the next permutation of a list in lexic开发者_开发问答ographical order) involve finding the index of the last element in a list. However, I haven't been able to find a way to do this in Mathematica that isn't awkward. The most straightforward approach uses LengthWhile, but it means reversing the whole list, which is likely to be inefficient in cases where you know the element you want is near the end of the list and reversing the sense of the predicate:

findLastLengthWhile[list_, predicate_] :=
 (Length@list - LengthWhile[Reverse@list, ! predicate@# &]) /. (0 -> $Failed)

We could do an explicit, imperative loop with Do, but that winds up being a bit clunky, too. It would help if Return would actually return from a function instead of the Do block, but it doesn't, so you might as well use Break:

findLastDo[list_, pred_] :=
 Module[{k, result = $Failed},
  Do[
   If[pred@list[[k]], result = k; Break[]],
   {k, Length@list, 1, -1}];
  result]

Ultimately, I decided to iterate using tail-recursion, which means early termination is a little easier. Using the weird but useful #0 notation that lets anonymous functions call themselves, this becomes:

findLastRecursive[list_, pred_] :=
 With[{
   step =
    Which[
      #1 == 0, $Failed,
      pred@list[[#1]], #1,
      True, #0[#1 - 1]] &},
  step[Length@list]]

All of this seems too hard, though. Does anyone see a better way?

EDIT to add: Of course, my preferred solution has a bug which means it's broken on long lists because of $IterationLimit.

In[107]:= findLastRecursive[Range[10000], # > 10000 &]
$IterationLimit::itlim: Iteration limit of 4096 exceeded. 
Out[107]= (* gack omitted *)

You can fix this with Block:

findLastRecursive[list_, pred_] :=
 Block[{$IterationLimit = Infinity},
  With[{
    step =
     Which[
       #1 == 0, $Failed,
       pred@list[[#1]], #1,
       True, #0[#1 - 1]] &},
   step[Length@list]]]

$IterationLimit is not my favorite Mathematica feature.


Not really an answer, just a couple of variants on findLastDo.

(1) Actually Return can take an undocumented second argument telling what to return from.

In[74]:= findLastDo2[list_, pred_] := 
 Module[{k, result = $Failed}, 
  Do[If[pred@list[[k]], Return[k, Module]], {k, Length@list, 1, -1}];
  result]

In[75]:= findLastDo2[Range[25], # <= 22 &]
Out[75]= 22

(2) Better is to use Catch[...Throw...]

In[76]:= findLastDo3[list_, pred_] := 
 Catch[Module[{k, result = $Failed}, 
   Do[If[pred@list[[k]], Throw[k]], {k, Length@list, 1, -1}];
   result]]

In[77]:= findLastDo3[Range[25], # <= 22 &]
Out[77]= 22

Daniel Lichtblau


For the adventurous...

The following definitions define a wrapper expression reversed[...] that masquerades as a list object whose contents appear to be a reversed version of the wrapped list:

reversed[list_][[i_]] ^:= list[[-i]]
Take[reversed[list_], i_] ^:= Take[list, -i]
Length[reversed[list_]] ^:= Length[list]
Head[reversed[list_]] ^:= List

Sample use:

$list = Range[1000000];
Timing[LengthWhile[reversed[$list], # > 499500 &]]
(* {1.248, 500500} *)

Note that this method is slower than actually reversing the list...

Timing[LengthWhile[Reverse[$list], # > 499500 &]]
(* 0.468, 500500 *)

... but of course it uses much less memory.

I would not recommend this technique for general use as flaws in the masquerade can manifest themselves as subtle bugs. Consider: what other functions need to implemented to make the simulation perfect? The exhibited wrapper definitions are apparently good enough to fool LengthWhile and TakeWhile for simple cases, but other functions (particularly kernel built-ins) may not be so easily fooled. Overriding Head seems particularly fraught with peril.

Notwithstanding these drawbacks, this impersonation technique can sometimes be useful in controlled circumstances.


Personally, I don't see anything wrong with LengthWhile-based solution. Also, if we want to reuse mma built-in list-traversing functions (as opposed to explicit loops or recursion), I don't see a way to avoid reverting the list. Here is a version that does that, but does not reverse the predicate:

Clear[findLastLengthWhile];
findLastLengthWhile[{}, _] = 0;
findLastLengthWhile[list_, predicate_] /; predicate[Last[list]] := Length[list];
findLastLengthWhile[list_, predicate_] :=
   Module[{l = Length[list]}, 
     Scan[If[predicate[#], Return[], l--] &, Reverse[list]]; l];

Whether or not it is simpler I don't know. It is certainly less efficient than the one based on LengthWhile, particularly for packed arrays. Also, I use the convention of returning 0 when no element satisfying a condition is found, rather than $Failed, but this is just a personal preference.

EDIT

Here is a recursive version based on linked lists, which is somewhat more efficient:

ClearAll[linkedList, toLinkedList];
SetAttributes[linkedList, HoldAllComplete];
toLinkedList[data_List] := Fold[linkedList, linkedList[], data];

Clear[findLastRec];
findLastRec[list_, pred_] :=
  Block[{$IterationLimit = Infinity},
     Module[{ll = toLinkedList[list], findLR},
       findLR[linkedList[]] := 0;
       findLR[linkedList[_, el_?pred], n_] := n;
       findLR[linkedList[ll_, _], n_] := findLR[ll, n - 1];
       findLR[ll, Length[list]]]]

Some benchmarks:

In[48]:= findLastRecursive[Range[300000],#<9000&]//Timing
Out[48]= {0.734,8999}

In[49]:= findLastRec[Range[300000],#<9000&]//Timing
Out[49]= {0.547,8999}

EDIT 2

If your list can be made a packed array (of whatever dimensions), then you can exploit compilation to C for loop-based solutions. To avoid the compilation overhead, you can memoize the compiled function, like so:

Clear[findLastLW];
findLastLW[predicate_, signature_] := findLastLW[predicate, Verbatim[signature]] = 
   Block[{list},
       With[{sig = List@Prepend[signature, list]},
      Compile @@ Hold[
        sig,
        Module[{k, result = 0},
          Do[
            If[predicate@list[[k]], result = k; Break[]], 
            {k, Length@list, 1, -1}
          ];
          result], 
        CompilationTarget -> "C"]]]

The Verbatim part is necessary since in typical signatures like {_Integer,1}, _Integer will otherwise be interpreted as a pattern and the memoized definition won't match. Here is an example:

In[60]:= 
fn = findLastLW[#<9000&,{_Integer,1}];
fn[Range[300000]]//Timing

Out[61]= {0.016,8999}

EDIT 3

Here is a much more compact and faster version of recursive solution based on linked lists:

Clear[findLastRecAlt];
findLastRecAlt[{}, _] = 0;
findLastRecAlt[list_, pred_] :=
  Module[{lls, tag},
    Block[{$IterationLimit = Infinity, linkedList},
       SetAttributes[linkedList, HoldAllComplete];
       lls = Fold[linkedList, linkedList[], list];
       ll : linkedList[_, el_?pred] := Throw[Depth[Unevaluated[ll]] - 2, tag];
       linkedList[ll_, _] := ll;
       Catch[lls, tag]/. linkedList[] :> 0]]

It is as fast as versions based on Do - loops, and twice faster than the original findLastRecursive (the relevant benchmark to be added soon - I can not do consistent (with previous) benchmarks being on a different machine at the moment). I think this is a good illustration of the fact that tail-recursive solutions in mma can be as efficient as procedural (uncompiled) ones.


Here are some alternatives, two of which don't reverse the list:

findLastLengthWhile2[list_, predicate_] := 
 Length[list]-(Position[list//Reverse, _?(!predicate[#] &),1,1]/.{}->{{0}})[[1, 1]]+1

findLastLengthWhile3[list_, predicate_] := 
    Module[{lw = 0}, 
      Scan[If[predicate[#], lw++, lw = 0] &, list]; 
      Length[list] - lw
    ]

findLastLengthWhile4[list_, predicate_] := 
   Module[{a}, a = Split[list, predicate]; 
         Length[list] - If[predicate[a[[-1, 1]]], Length[a[[-1]]], 0]
   ]

Some timings (number 1 is Pillsy's first one) of finding the last run of 1's in an array of 100,000 1's in which a single zero is placed on various positions. Timings are the mean of 10 repeated meusurements:

Searching from the end of a list in Mathematica

Code used for timings:

Monitor[
 timings = Table[
   ri = ConstantArray[1, {100000}];
   ri[[daZero]] = 0;
   t1 = (a1 = findLastLengthWhile[ri, # == 1 &];) // Timing // First;
   t2 = (a2 = findLastLengthWhile2[ri, # == 1 &];) // Timing // First;
   t3 = (a3 = findLastLengthWhile3[ri, # == 1 &];) // Timing // First;
   t4 = (a4 = findLastLengthWhile4[ri, # == 1 &];) // Timing // First;
   {t1, t2, t3, t4},
   {daZero, {1000, 10000, 20000, 50000, 80000, 90000, 99000}}, {10}
   ], {daZero}
 ]

ListLinePlot[
   Transpose[{{1000, 10000, 20000, 50000, 80000, 90000,99000}, #}] & /@ 
     (Mean /@ timings // Transpose), 
   Mesh -> All, Frame -> True, FrameLabel -> {"Zero position", "Time (s)", "", ""}, 
   BaseStyle -> {FontFamily -> "Arial", FontWeight -> Bold, 
   FontSize -> 14}, ImageSize -> 500
]


Timing Reverse for Strings and Reals

a = DictionaryLookup[__];
b = RandomReal[1, 10^6];
Timing[Short@Reverse@#] & /@ {a, b}

(*
 ->
{{0.016,         {Zyuganov,Zyrtec,zymurgy,zygotic,zygotes,...}},
 {3.40006*10^-15,{0.693684,0.327367,<<999997>>,0.414146}}}
*)


An elegant solution would be:

findLastPatternMatching[{Longest[start___], f_, ___}, f_] := Length[{start}]+1

(* match this pattern if item not in list *)
findLastPatternMatching[_, _] := -1

but as it's based on pattern matching, it's way slower than the other solutions suggested.

0

精彩评论

暂无评论...
验证码 换一张
取 消