I need to write a program that performs arithmetic (+-*/) on multiples time series of different date range (mostly from 2007-2009) and frequency (weekly, mont开发者_如何学编程hly, yearly...).
I came up with:
- find the series with the highest freq. then fill in the other series with zeros so they have the same number of elements. then perform the operation.
How can I present the data in the most meaningful way?
Trying to think of all the possibilities
If zero can be a meaningful value for this time series (e.g. temperature in Celsius degrees), it might not be a good idea to fill all gaps with zeros (i.e. you will not be able to distinguish between the real and stub values afterwards). You might want to interpolate your time series. Basic data structure for this can be array/double linked list.
You can take several approaches:
- use the finest-grained time series data (for instance, seconds) and interpolate/fill data when needed
- use the coarsest-grained (for instance, years) and summarize data when needed
- any middle step between the two extremes
You should always know your data, because:
- in case of interpolating you have to choose the best algorithm (linear or quadratic interpolation, splines, exponential...)
- in case of summarizing you have to choose an appropriate aggregation function (sum, maximum, mean...)
Once you have the same time scales for all the time series you can perform your arithmetical magick, but be aware that interpolation generates extra information, and summarization removes available information.
I've studied this problem fairly extensively. The danger of interpolation methods is that you bias various measures - especially volatility - and introduce spurious correlation. I found that Fourier interpolation mitigated this to some extent but the better approach is to go the other way: aggregate your more frequent observations to match the periodicity of your less frequent series, then compare these.
精彩评论