I am learning F# and I'm doing and odds comparison service (ala www.bestbetting.com) to pu theory into practice. So far I have the following structures of data:
type price = { Bookie : string; Odds : float32; }
type selection = {
Prices : list<price>;
Name : string;
}
type event = { Name : string; Hour : DateTime; Sport : string; Selections : list<selection>; }
So, I have several of these "Events" coming from several sources. And I would need a really fast way of merging events with the same Name and Hour, and once that is done merge the prices of its different selections that have the same Name.
I've thought about getting the first list and then do a one-by-one search on the other lists and when the specified field matches return a new list containing both lists merged.
开发者_运维问答I'd like to know if there's a faster way of doing this as performance would be important. I have already seen this Merge multiple lists of data together by common ID in F# ... And although that was helpful, I am asking for the best performance-wise solution. Maybe using any other structure that it's not a list or another way of merging them... so any advice would be greatly appreciated.
Thanks!
As Daniel mentioned in the comment, the key question is, how much better does the performance need to be compared to a solution based on standard Seq.groupBy
function? If you have a lot of data to process, then it may be actually easier to use some database for this purpose.
If you only need something ~1.7 times faster (or possibly more, depending on the number of cores :-)), then you can try replacing Seq.groupBy
with parallel version based on Parallel LINQ that is available in F# PowerPack. Using PSeq.groupBy
(and other PSeq
functions), you can write something like this:
#r "FSharp.PowerPack.Parallel.Seq.dll"
open Microsoft.FSharp.Collections
// Takes a collection of events and merges prices of events with the same name/hour
let mergeEvents (events:seq<event>) =
events
|> PSeq.groupBy (fun evt -> evt.Name, evt.Hour)
|> PSeq.map (fun ((name, hour), events) ->
// Merge prices of all events in the group with the same Selections.Name
let selections =
events
|> PSeq.collect (fun evt -> evt.Selections)
|> PSeq.groupBy (fun sel -> sel.Name)
|> PSeq.map (fun (name, sels) ->
{ Name = name
Prices = sels |> Seq.collect (fun s -> s.Prices) |> List.ofSeq } )
|> PSeq.toList
// Build new Event as the result - since we're grouping just using
// name & hour, I'm using the first available 'Sport' value
// (which may not make sense)
{ Name = name
Hour = hour
Sport = (Seq.head events).Sport
Selections = selections })
|> PSeq.toList
I didn't test the performance of this version, but I believe it should be faster. You also don't need to reference the entire assembly - you can just copy source for the few relevant functions from PowerPack source code. Last time I checked, the performance was better when the functions were marked as inline
, which is not the case in the current source code, so you may want to check that too.
I haven't tested it, but I think this would work.
let events = List.init 10 (fun _ -> Unchecked.defaultof<event>) //TODO: initialize to something meaningful
for ((name, hour), evts) in (events |> Seq.groupBy (fun e -> e.Name, e.Hour)) do
printfn "Name: %s, Hour: %A" name hour
let prices =
seq {
for e in evts do
for s in e.Selections do
for p in s.Prices do
yield s.Name, p
}
|> Seq.groupBy fst
for (selectionName, p) in prices do
printfn " Selection Name: %s" selectionName
for (_, price) in p do
printfn " %A" price
精彩评论