I have a streaming set 开发者_开发知识库of values that I would like to analyze for abrupt changes and possibly ignore spikes/noise in the data. I've looked at moving averages, winsorised means and several other possible solutions including PID controllers in control systems, the colt library and numpy for clues as to how to solve this.
A sample dataset is below.
22.0, 22.0, 22.0, 22.0, 20.8806130178211, 20.8806130178211, 20.8806130178211, 20.8806130178211, 20.8806130178211, 20.8806130178211, 21.840329667841555, 21.840329667841555, 20.8806130178211, 20.8806130178211, 20.8806130178211,20.8806130178211, 20.8806130178211, 20.8806130178211, 21.840329667841555, 21.840329667841555, 21.840329667841555,21.840329667841555, 22.80350850198276Ideally I would like to detect that the values change in the 1st, 3rd and 4th sections in bold. The second section can be treated like a spike.
Looking for an elegant mathematical/algorithmic solution that works like a moving average in that if the data does not change for a long time (a window that is dynamic) it will ignore old data. In the case of the above data the initial values of 22 are ignored when considering the next window of data that is 20.8806130178211.
The solution (program/class) should be able to accept a new data input value (22.0232) and return a value of true or false if it computes that the value is within the acceptable range i.e. it hasn't changed considerably.
Thanks
sfkPerhaps a better approach than looking at the moving average in your data is looking at the moving average of the change in your data. So you could take the first difference of your dataset and identify values greater than some threshold.
精彩评论