开发者

C# Calculation of moving median of time series SortedList<DateTime, double> - improve performance?

开发者 https://www.devze.com 2023-02-16 04:58 出处:网络
I have a method that calculates the moving median value of a time series. Like a moving average, it use a fixed window or period (sometimes referred to as the look back period).

I have a method that calculates the moving median value of a time series. Like a moving average, it use a fixed window or period (sometimes referred to as the look back period). If the period is 10, it will created an array of the first 10 values (0-9), then find the median value of them. It will repeat this, incrementing the window by 1 step (values 1-10 now) and so on... hence the moving part of this. This is process is exactly the same as a moving average.

The median value is found by:

  1. Sorting the values of an array
  2. If there is an odd number of values in the array, take the mid value. The median of a sorted array of 5 values would be the 3rd value.
  3. If there is an even number of values in the array, take the two values on each side of the mid and average them. The median of a sorted array of 6 values would be the (2nd + 3rd) / 2.

I have created a function that calculates this by populating a List<double>, calling List<>.Sort(), and then finding the appropriate values.

Computational it is correct, but I was wonder if there ws a way to improve the performance of this calculation. Perhaps by hand-rolling a sort on an double[] rather than using a list.

My implementation is as follows:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Moving_Median_TimeSeries
{
    class Program
    {
        static void Main(string[] args)
        {
            // created a a sample test time series of 10 days
            DateTime Today = DateTime.Now;
            var TimeSeries = new SortedList<DateTime, double>();
            for (int i = 0; i < 10; i++)
                TimeSeries.Add(Today.AddDays(i), i * 10);

            // write out the time series
            Console.WriteLine("Our time series contains...");
            foreach (var item in TimeSeries) 
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);

            // calculate an even period moving median 
            int period = 6;
            var TimeSeries_MovingMedian = MovingMedian(TimeSeries, period);

            // write out the result of the calculation
            Console.WriteLine("\nThe moving median time series of {0} periods contains...", period);
            foreach (var item in TimeSeries_MovingMedian)
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);

            // calculate an odd period moving median 
            int period2 = 5;
            var TimeSeries_MovingMedian2 = MovingMedian(TimeSeries, period);

            // write out the result of the calculation
            Console.WriteLine("\nThe moving median time series of {0} periods contains...", period2);
            foreach (var item in TimeSeries_MovingMedian2)
                Console.WriteLine("   {0}, {1}", item.Key.ToShortDateString(), item.Value);
        }

        public static SortedList<DateTime, double> MovingMedian(SortedList<DateTime, double> TimeSeries, int period)
        {
            var result = new SortedList<DateTime, double>();

            for (int i = 0; i < TimeSeries.Count(); i++)
            {
                if (i >= period - 1)
                {
                    // add all of the values used in the calc to a list... 
                    v开发者_运维知识库ar values = new List<double>();
                    for (int x = i; x > i - period; x--)
                        values.Add(TimeSeries.Values[x]);

                    // ... and then sort the list <- there might be a better way than this
                    values.Sort();

                    // If there is an even number of values in the array (example 10 values), take the two mid values
                    // and average them.  i.e. 10 values = (5th value + 6th value) / 2. 
                    double median;
                    if (period % 2 == 0) // is any even number
                        median = (values[(int)(period / 2)] + values[(int)(period / 2 - 1)]) / 2;
                    else // is an odd period
                    // Median equals the middle value of the sorted array, if there is an odd number of values in the array
                        median = values[(int)(period / 2 + 0.5)];

                    result.Add(TimeSeries.Keys[i], median);
                }
            }
            return result;
        }

    }
}


there might be a better way than this

You are right about this - you don't need to sort the whole list if all you want is the median. Follow links from this wikipedia page for more.


For a list of N items and a period P, your algorithm which re-sorts the list for every item is O(N * P lgP). There is an O(N * lg P) algorithm, which uses 2 heaps.

It uses a min-heap which contains P/2 items above the median, and a max-heap with the P-P/2 items less than or equal to it. Whenever you get a new data item, replace the oldest item with the new one, then do a sift-up or sift-down to move it to the correct place. If the new item reaches the root of either heap, compare it to the root of the other and swap and sift-down when needed. For odd P, the median is at the root of the max-heap. For even P, it is the average of both roots.

There is a c implementation in this question. One tricky part in implementing it is tracking the oldest item efficiently. The overhead in that part may make the speed gains insignificant for small P.

0

精彩评论

暂无评论...
验证码 换一张
取 消