开发者

What's a succinct, useful and efficient way to store large time-series in F#?

开发者 https://www.devze.com 2023-01-29 16:26 出处:网络
I\'m currently learning F# and I\'m exploring using it to analyse financial time-series. Can anyone recommend a good data structure to store time-series data in?

I'm currently learning F# and I'm exploring using it to analyse financial time-series. Can anyone recommend a good data structure to store time-series data in?

F# offers a rich selection of native types and I'm looking for a some simple combination that would provide an elegant, succinct and efficient solution.

I'm looking store tick data, which consists of millions of records each with a time stamp, and several (~5-20) fields of numerical and textual data, with possible missing values.

My first thoughts are perhaps a sequence of tuples or records, but I was wondering if someone could kindly suggest something that has worked well in the real world.

EDIT:

A few extra points for clarification:

The common operations that I'm likely to require are:

  • Time based lookup - i.e. find the most recent data point开发者_如何转开发 at a given time
  • Time based joins
  • Appends (Updates and deletes are going to be rare. )

I should make it clear I'm exploring using F# primarily as an interactive tool for research, with the ability to compile as a (really big) added bonus.

ANOTHER EDIT:

I should also have mentioned, my role/use of F# and this data is purely within research not development. The intention being that once we understand the data (and what we want to do with it) better then we can later specify tools that our developers would build. Such as data warehouses etc. at which we'd start using their data structures etc.

Although, I am concerned that our models are computationally intensive, use a lot of memory and can't always be coded in a recursive manner. So we many end up having to query out large chunks anyway.

I should also say that I've always used Matlab or R for these sorts of tasks before but I'm now interested in F# as it offers that interactive, high level flexibility for Research but the same code can be used in production.

My apologies for not giving this context information at the start (It's my first question), I can see now that it helps people form their answers.

My thanks again to everyone that's taken the time to help me.


It really sounds like your data should be stored and queried in a relational database (where is it currently stored?: loading millions of records with several fields into memory must be an expensive operation, and could leave you with stale data and difficulty persisting changes). And then you could use the F# LINQ to SQL implementation (which I believe you can find in the Power Pack) to have F# expressions translated to SQL expressions.

Here's a link from Don Syme about LINQ Support in F# Power Pack: http://blogs.msdn.com/b/dsyme/archive/2009/10/23/a-quick-refresh-on-query-support-in-the-f-power-pack.aspx


The best choice of data structure depends upon what operations you want to do on it.

The simplest would be an array of structs. This has the advantages of fast random lookup, good space efficiency for an uncompressed representation and good locality. If there is sharing between substructures (like the strings) then intern them to make sure they get shared.

Alternatives might be a seq that is loaded from disk on-demand, a singly-linked list that allows you to prepend elements quickly or a balanced binary trees that allows operations like insertion at random locations efficiently.

0

精彩评论

暂无评论...
验证码 换一张
取 消