how to compare two curves (arrays of points)_问答_开发者

i have 2 arrays of points (x,y), with those points I can draw 2 cur开发者_Python百科ves.

Anyone have ideas how to calculate how those curves are similar?

You can always calculate the area between those two curves. (This is a bit easier if the endpoints match.) The curves are similar if the area is small, not so similar if the area is not small.

Note that I did not define 'small'. That was intentional. Then again, you didn't define 'similar'.

Edit
Sometimes area isn't the best metric. For example consider the function f(x)=0 and f(x)=1e6*sin(x). If the range of x is some integral multiple of 2*pi, the area between these curves is zero. A function that oscillates between plus and minus one million is not a good approximation of f(x)=0.

A better metric is needed. Here are a couple. Note: I am assuming here that the x values are identical in the two sets; the only things that differ are the y values.

Sum of squares. For each x value, compute delta_y_i = y_1,i - y_2,i and accumulate delta_y_i². This metric is the basis for a least square optimization, where the goal is to minimize the sum of the squares of the errors. This is a widely used approach because oftentimes it is fairly easy to implement.
Maximum deviation. Find the abs_delta_y_i = |y_1,i - y_2,i| that maximizes the |y_1,i - y_2,i| for all x values. This metric is the basis for a lot of the implementations of the functions in the math library, where the goal is to minimize the maximum error. These math library implementations are approximations of the true function. As a consumer of such an approximation, I typically care more about the worst thing that the approximation is going to do to my application than I care about how that approximation is going to behave on average.

You might want to consider using Dynamic Time Warping (DTW) or Frechet distance.

Dynamic Time Warping

Dynamic Time Warping sums the difference throughout the entire curve. It can handle two arrays of different sizes. Here is a snippet from Wikipedia on how the code might look. This solution uses a two-dimensional array. The cost would be the distance between two points. The final value of the array DTW[n, m] contains the cumulative distance.

int DTWDistance(s: array [1..n], t: array [1..m]) {
    DTW := array [0..n, 0..m]

    for i := 1 to n
        DTW[i, 0] := infinity
    for i := 1 to m
        DTW[0, i] := infinity
    DTW[0, 0] := 0

    for i := 1 to n
        for j := 1 to m
            cost:= d(s[i], t[j])
            DTW[i, j] := cost + minimum(DTW[i-1, j  ],    // insertion
                                        DTW[i  , j-1],    // deletion
                                        DTW[i-1, j-1])    // match

    return DTW[n, m]
}

DTW is similar to Jacopson's answer.

Frechet Distance

Frechet distance calculates the farthest that the curves separate. This means that all other points on the curve are closer together than this distance. This approach is typically represented with a dog and owner as shown here: Frechet Distance Example.

Depending on your arrays, you can compare the distance of the points and use the maximum.

I assume a Curve is an array of 2D points over the real numbers, the size of the array is N, so I call p[i] the i-th point of the curve; i goes from 0 to N-1.

I also assume that the two curves have the same size and that it is meaningful to "compare" the i-th point of the first curve with the i-th point of the second curve.

I call Delta, a real number, the result of the comparison of the two curves.

Delta can be computed as follow:

Delta = 0;
for( i = 0; i < N; i++ ) {
   Delta = Delta + distance(p[i],q[i]);
}

where p are points from the first curve and q are points from the second curve.

Now you have to choose a suitable distance function depending on your problem: the function has two points as arguments and returns a real number.

For example distance can be the usual distance of two point on the plane (Pythagorean theorem and http://en.wikipedia.org/wiki/Euclidean_distance).

An example of the method in C++:

#include <numeric>
#include <vector>
#include <cmath>
#include <iostream>
#include <functional>
#include <stdexcept>

typedef double Real_t;

class Point
{
public:
    Point(){}
    Point(std::initializer_list<Real_t> args):x(args.begin()[0]),y(args.begin()[1]){}
    Point( const Real_t& xx, const Real_t& yy ):x(xx),y(yy){}
    Real_t x,y;
};

typedef std::vector< Point > Curve;

Real_t point_distance( const Point& a, const Point& b )
{
    return hypot(a.x-b.x,a.y-b.y);
}

Real_t curve_distance( const Curve& c1, const Curve& c2 )
{
    if ( c1.size() != c2.size() ) throw std::invalid_argument("size mismatch");
    return std::inner_product( c1.begin(), c1.end(), c2.begin(), Real_t(0), std::plus< Real_t >(), point_distance );
}

int main(int,char**)
{
    Curve c1{{0,0},
             {1,1},
             {2,4},
             {3,9}};

    Curve c2{{0.1,-0.1},
             {1.1,0.9},
             {2.1,3.9},
             {3.1,8.9}};

    std::cout << curve_distance(c1,c2) << "\n";

    return 0;
}

If your two curves have different size then you have to think how to extend the previous method, for example you can reduce the size of the longest curve by means of a suitable algorithm (for example the Ramer–Douglas–Peucker algorithm can be a starting point) in order to match it to the size of the shortest curve.

I have just described a very simple method, you can also take different approaches; for example you can fit two curves to the two set of points and then work with the two curves expressed as mathematical function.

This can also be solved, thinking in terms of distributions.

Especially if the position of a value is interchangeable within an array.

Then you could calculate the mean and the std (and other distribution characteristics) for both arrays. And calculate the difference between those characteristics.