Projecting a list of lists efficiently in F#_问答_开发者

I have to do projection of a list of lists which returns all combinations with each element from each list. For example:

projection([[1]; [2; 3]]) = [[1; 2]; [1; 3]].
projection([[1]; [2; 3]; [4; 5]]) = [[1; 2; 4]; [1; 2; 5]; [1; 3; 4]; [1; 3; 5]].

I come up with a function:

let projection lss0 =
    let rec projectionUtil lss accs =
        match lss with
        | []        ->  accs
        | ls::lss'  ->  projectionUtil lss' (List.fold (fun accs' l -> 
                                                        accs' @ List.map (fun acc -> acc @ [l]) accs) 
                                                        [] ls)
match lss0 with
| [] -> []
| ls::lss' ->         
    projectionUtil lss' (List.map (fun l -> [l]) ls)

and a testcase:

#time "on";;
let N = 10
let fss0 = List.init N (fun i -> List.init (i+1) (fun j ->开发者_开发知识库; j+i*i+i));;
let fss1 = projection fss0;;

The function is quite slow now, with N = 10 it takes more than 10 seconds to complete. Moreover, I think the solution is unnatural because I have to breakdown the same list in two different ways. Any suggestion how I can improve performance and readability of the function?

First of all, try to avoid list concatenation (@) whenever possible, since it's O(N) instead of O(1) prepend.

I'd start with a (relatively) easy to follow plan of how to compute the cartesian outer product of lists.

Prepend each element of the first list to each sublist in the cartesian product of the remaining lists.
Take care of the base case.

First version:

let rec cartesian = function
  | [] -> [[]]
  | L::Ls -> [for C in cartesian Ls do yield! [for x in L do yield x::C]]

This is the direct translation of the sentences above to code.

Now speed this up: instead of list comprehensions, use list concatenations and maps:

let rec cartesian2 = function
  | [] -> [[]]
  | L::Ls -> cartesian2 Ls |> List.collect (fun C -> L |> List.map (fun x->x::C))

This can be made faster still by computing the lists on demand via a sequence:

let rec cartesian3 = function
  | [] -> Seq.singleton []
  | L::Ls -> cartesian3 Ls |> Seq.collect (fun C -> L |> Seq.map (fun x->x::C))

This last form is what I use myself, since I most often just need to iterate over the results instead of having them all at once.

Some benchmarks on my machine: Test code:

let test f N = 
  let fss0 = List.init N (fun i -> List.init (i+1) (fun j -> j+i*i+i))
  f fss0 |> Seq.length

Results in FSI:

> test projection 10;;
Real: 00:00:18.066, CPU: 00:00:18.062, GC gen0: 168, gen1: 157, gen2: 7
val it : int = 3628800
> test cartesian 10;;
Real: 00:00:19.822, CPU: 00:00:19.828, GC gen0: 244, gen1: 121, gen2: 3
val it : int = 3628800
> test cartesian2 10;;
Real: 00:00:09.247, CPU: 00:00:09.250, GC gen0: 94, gen1: 52, gen2: 2
val it : int = 3628800
> test cartesian3 10;;
Real: 00:00:04.254, CPU: 00:00:04.250, GC gen0: 359, gen1: 1, gen2: 0
val it : int = 3628800

This function is Haskell's sequence (although sequence is more generic). Translating to F#:

let sequence lss =
    let k l ls = [ for x in l do for xs in ls -> x::xs ]
    List.foldBack k lss [[]]

in interactive:

> test projection 10;;
Real: 00:00:12.240, CPU: 00:00:12.807, GC gen0: 163, gen1: 155, gen2: 4
val it : int = 3628800
> test sequence 10;;
Real: 00:00:06.038, CPU: 00:00:06.021, GC gen0: 75, gen1: 74, gen2: 0
val it : int = 3628800

General idea: avoid explicit recursion in favor to standard combinators (fold, map etc.)

Here's a tail-recursive version. It's not as fast as some of the other solutions (only 25% faster than your original function), but memory usage is constant, so it works for extremely large result sets.

let cartesian l = 
  let rec aux f = function
    | [] -> f (Seq.singleton [])
    | h::t -> aux (fun acc -> f (Seq.collect (fun x -> (Seq.map (fun y -> y::x) h)) acc)) t
  aux id l

You implementation is slow because of the @ (i.e List concat) operation, which is a slow operation and it is being done many a times in recursive way. The reason for @ being slow is that List are Linked list in functional programming and to concat 2 list you have to first go till the end of the list (one by one traversing through elements) and then append another list .

Please look at the suggested references in comments. I hope those will help you out.

The following version is even faster than cartesian3, and uses basic features of functional programming (no fancy List.collect, Seq.collect...)

let cartesian xss =
    let rec add x yss s =
        match yss with
        | [] -> s
        | ys :: yss' -> add x yss' ((x :: ys) :: s)

    let rec mul xs yss p =
        match xs with
        | [] -> p
        | x :: xs' -> mul xs' yss (add x yss p)

    let rec cartesian xss c =
        match xss with
        | [] -> c
        | xs :: xss' -> cartesian xss' (mul xs c [])

    cartesian xss [ [] ]

Results

> test cartesian3 10;;
Real: 00:00:04.132, CPU: 00:00:04.109, GC Gen0: 482, Gen1: 2, Gen2: 1
val it: int = 3628800

> test cartesian 10;;
Real: 00:00:01.414, CPU: 00:00:01.406, GC Gen0: 27, Gen1: 16, Gen2: 2
val it: int = 3628800

> test cartesian3 11;;
Real: 00:00:45.652, CPU: 00:00:45.281, GC Gen0: 5299, Gen1: 5, Gen2: 1
val it: int = 39916800

> test cartesian 11;;
Real: 00:00:17.242, CPU: 00:00:16.812, GC Gen0: 260, Gen1: 174, Gen2: 6
val it: int = 39916800

The partition strategy used here is naive: the input list xss is separated into head and tail, I believe that a smarter strategy can give much better performance.

Edit: Another solution is of Christopher Strachey, which is explained in [1] (the observation is that the recursion on list can be expressed by folding):

let cartesianf xss =
    let f xs yss =
        let h x ys uss = (x :: ys) :: uss
        let g yss x zss = List.foldBack (h x) yss zss
        List.foldBack (g yss) xs []

    List.foldBack f xss [ [] ]

[1] Mike Spivey. Strachey's function pearl, forty years on.

let crossProduct listA listB listC listD listE = 
  listA |> Seq.collect (fun a -> 
  listB |> Seq.collect (fun b -> 
  listC |> Seq.collect (fun c -> 
  listD |> Seq.collect (fun d -> 
  listE |> Seq.map (fun e -> a,b,c,d,e))