What is the most efficient algorithm to find a straight line that goes through most points?_问答_开发者

The problem:

N poin开发者_如何学Pythonts are given on a 2-dimensional plane. What is the maximum number of points on the same straight line?

The problem has O(N²) solution: go through each point and find the number of points which have the same dx / dy with relation to the current point. Store dx / dy relations in a hash map for efficiency.

Is there a better solution to this problem than O(N²)?

There is likely no solution to this problem that is significantly better than O(n^2) in a standard model of computation.

The problem of finding three collinear points reduces to the problem of finding the line that goes through the most points, and finding three collinear points is 3SUM-hard, meaning that solving it in less than O(n^2) time would be a major theoretical result.

See the previous question on finding three collinear points.

For your reference (using the known proof), suppose we want to answer a 3SUM problem such as finding x, y, z in list X such that x + y + z = 0. If we had a fast algorithm for the collinear point problem, we could use that algorithm to solve the 3SUM problem as follows.

For each x in X, create the point (x, x^3) (for now we assume the elements of X are distinct). Next, check whether there exists three collinear points from among the created points.

To see that this works, note that if x + y + z = 0 then the slope of the line from x to y is

(y^3 - x^3) / (y - x) = y^2 + yx + x^2

and the slope of the line from x to z is

(z^3 - x^3) / (z - x) = z^2 + zx + x^2 = (-(x + y))^2 - (x + y)x + x^2 = x^2 + 2xy + y^2 - x^2 - xy + x^2 = y^2 + yx + x^2

Conversely, if the slope from x to y equals the slope from x to z then

y^2 + yx + x^2 = z^2 + zx + x^2,

which implies that

(y - z) (x + y + z) = 0,

so either y = z or z = -x - y as suffices to prove that the reduction is valid.

If there are duplicates in X, you first check whether x + 2y = 0 for any x and duplicate element y (in linear time using hashing or O(n lg n) time using sorting), and then remove the duplicates before reducing to the collinear point-finding problem.

If you limit the problem to lines passing through the origin, you can convert the points to polar coordinates (angle, distance from origin) and sort them by angle. All points with the same angle lie on the same line. O(n logn)

I don't think there is a faster solution in the general case.

The Hough Transform can give you an approximate solution. It is approximate because the binning technique has a limited resolution in parameter space, so the maximum bin will give you some limited range of possible lines.

Again an O(n^2) solution with pseudo code. Idea is create a hash table with line itself as the key. Line is defined by slope between the two points, point where line cuts x-axis and point where line cuts y-axis.

Solution assumes languages like Java, C# where equals method and hashcode methods of the object are used for hashing function.

Create an Object (call SlopeObject) with 3 fields

Slope // Can be Infinity
Point of intercept with x-axis -- poix // Will be (Infinity, some y value) or (x value, 0)
Count

poix will be a point (x, y) pair. If line crosses x-axis the poix will (some number, 0). If line is parallel to x axis then poix = (Infinity, some number) where y value is where line crosses y axis. Override equals method where 2 objects are equal if Slope and poix are equal.

Hashcode is overridden with a function which provides hashcode based on combination of values of Slope and poix. Some pseudo code below

Hashmap map;
foreach(point in the array a) {
    foeach(every other point b) {
        slope = calculateSlope(a, b);
        poix = calculateXInterception(a, b);
        SlopeObject so = new SlopeObject(slope, poix, 1); // Slope, poix and intial count 1.
        SlopeObject inMapSlopeObj = map.get(so);
        if(inMapSlopeObj == null) {
            inMapSlopeObj.put(so);
        } else {
            inMapSlopeObj.setCount(inMapSlopeObj.getCount() + 1);
        }
    }
}
SlopeObject maxCounted = getObjectWithMaxCount(map);
print("line is through " + maxCounted.poix + " with slope " + maxCounted.slope);

Move to the dual plane using the point-line duality transform for p=(a,b) p*:y=a*x + b. Now using a line sweep algorithm find all intersection points in NlogN time. (If you have points which are one above the other just rotate the points to some small angle). The intersection points corresponds in the dual plane to lines in the primer plane.

Whoever said that since 3SUM have a reduction to this problem and thus the complexity is O(n^2). Please note that the complexity of 3SUM is less than that. Please check https://en.wikipedia.org/wiki/3SUM and also read https://tmc.web.engr.illinois.edu/reduce3sum_sosa.pdf

As already mentioned, there probably isn't a way to solve the general case of this problem better than O(n^2). However, if you assume a large number of points lie on the same line (say the probability that a random point in the set of points lie on the line with the maximum number of points is p) and don't need an exact algorithm, a randomized algorithm is more efficient.

maxPoints = 0
Repeat for k iterations:
    1. Pick 2 random, distinct points uniformly at random
    2. maxPoints = max(maxPoints, number of points that lies on the 
       line defined by the 2 points chosen in step 1)

Note that in the first step, if you picked 2 points which lies on the line with the maximum number of points, you'll get the optimal solution. Assuming n is very large (i.e. we can treat the probability of finding 2 desirable points as sampling with replacement), the probability of this happening is p^2. Therefore the probability of finding a suboptimal solution after k iterations is (1 - p^2)^k.

Suppose you can tolerate a false negative rate rate = err. Then this algorithm runs in O(nk) = O(n * log(err) / log(1 - p^2)). If both n and p are large enough, this is significantly more efficient than O(n^2). (i.e. Supposed n = 1,000,000 and you know there are at least 10,000 points that lie on the same line. Then n^2 would required on the magnitude of 10^12 operations, while randomized algorithm would require on the magnitude of 10^9 operations to get a error rate of less than 5*10^-5.)

It is unlikely for a $o(n^2)$ algorithm to exist, since the problem (of even checking if 3 points in R^2 are collinear) is 3Sum-hard (http://en.wikipedia.org/wiki/3SUM)

This is not a solution better than O(n^2), but you can do the following,

For each point convert first convert it as if it where in the (0,0) coordinate, and then do the equivalent translation for all the other points by moving them the same x,y distance you needed to move the original choosen point.

2.Translate this new set of translated points to the angle with respect to the new (0,0).

3.Keep stored the maximum number (MSN) of points that are in each angle.

4.Choose the maximum stored number (MSN), and that will be the solution