开发者

Remove redundant points for line plot

开发者 https://www.devze.com 2023-02-05 12:11 出处:网络
I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.

I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.

My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (th开发者_Go百科at is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?

Thank you for your time.


The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:

  1. Compute the line equation between first and last point
  2. Check all other points to find what is the most distant from the line
  3. If the worst point is below the tolerance you want then output a single segment
  4. Otherwise call recursively considering two sub-arrays, using the worst point as splitter

In python this could be

def simplify(pts, eps):
    if len(pts) < 3:
        return pts
    x0, y0 = pts[0]
    x1, y1 = pts[-1]
    m = float(y1 - y0) / float(x1 - x0)
    q = y0 - m*x0
    worst_err = -1
    worst_index = -1
    for i in xrange(1, len(pts) - 1):
        x, y = pts[i]
        err = abs(m*x + q - y)
        if err > worst_err:
            worst_err = err
            worst_index = i
    if worst_err < eps:
        return [(x0, y0), (x1, y1)]
    else:
        first = simplify(pts[:worst_index+1], eps)
        second = simplify(pts[worst_index:], eps)
        return first + second[1:]

print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)

The output is [(0, 0), (30, 30), (50, 0)].

About python syntax for arrays that may be non obvious:

  • x[a:b] is the part of array from index a up to index b (excluded)
  • x[n:] is the array made using elements of x from index n to the end
  • x[:n] is the array made using first n elements of x
  • a+b when a and b are arrays means concatenation
  • x[-1] is the last element of an array

An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.


I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.

So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.

Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.

I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.


I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.

Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号