开发者

Fastest way to merge lists that have a common field?

开发者 https://www.devze.com 2023-03-16 12:37 出处:网络
I am learning F# and I\'m doing and odds comparison service (ala www.bestbetting.com) to pu theory into practice.

I am learning F# and I'm doing and odds comparison service (ala www.bestbetting.com) to pu theory into practice. So far I have the following structures of data:

type price = {    Bookie : string;    Odds : float32;    }

type selection = {
    Prices : list<price>;
    Name : string;
    }

type event = {    Name : string;    Hour : DateTime;    Sport : string;    Selections : list<selection>;    }

So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.

I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.

开发者_运维问答I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.

Thanks!


As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.

If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy (and other PSeq functions), you can write something like this:

#r "FSharp.PowerPack.Parallel.Seq.dll"
open Microsoft.FSharp.Collections

// Takes a collection of events and merges prices of events with the same name/hour
let mergeEvents (events:seq<event>) = 
  events 
  |> PSeq.groupBy (fun evt -> evt.Name, evt.Hour)
  |> PSeq.map (fun ((name, hour), events) ->
      // Merge prices of all events in the group with the same Selections.Name
      let selections = 
        events 
        |> PSeq.collect (fun evt -> evt.Selections)
        |> PSeq.groupBy (fun sel -> sel.Name)
        |> PSeq.map (fun (name, sels) ->
            { Name = name
              Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } )
        |> PSeq.toList
      // Build new Event as the result - since we're grouping just using 
      // name & hour, I'm using the first available 'Sport' value 
      // (which may not make sense)
      { Name = name
        Hour = hour
        Sport = (Seq.head events).Sport
        Selections = selections })   
  |> PSeq.toList

I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline, which is not the case in the current source code, so you may want to check that too.


I haven't tested it, but I think this would work.

let events = List.init 10 (fun _ -> Unchecked.defaultof<event>) //TODO: initialize to something meaningful

for ((name, hour), evts) in (events |> Seq.groupBy (fun e -> e.Name, e.Hour)) do
  printfn "Name: %s, Hour: %A" name hour
  let prices = 
    seq {
      for e in evts do
        for s in e.Selections do
          for p in s.Prices do
            yield s.Name, p 
    }
    |> Seq.groupBy fst

  for (selectionName, p) in prices do
    printfn "  Selection Name: %s" selectionName
    for (_, price) in p do
      printfn "    %A" price
0

精彩评论

暂无评论...
验证码 换一张
取 消