Shortest distance pair with points given on a line?_问答_开发者

Shortest distance pair with points given on a line?

开发者 https://www.devze.com 2023-03-26 20:05 出处：网络

Can someone suggest an algorithm to find out the shortest distance pair of unsorted, colinear points?

I开发者_运维技巧've got one solution which does this in O(nlogn) by just doing the closest pair of points in 2D and applying to the line. However, can this be done more efficiently?

I'm afraid you have to sort the points, which takes you at least O(n*log(n)) time (unless you can use bucket sort), so I doubt you find faster algorithm for this.

Notice that you can always reduce the 2D case to the 1D case by performing a rotation transformation.

Unfortunately I don't think you can do better than O(nlogn) in the general case. Your best option would be to sort them and then traverse the list.

This is an expected O(n) time algorithm for closest pair of points in the plane.
It's from the the Algorithm Design book by Kleinberg and Tardos.

Here it is in a Python-like pseudo-code

def Bucket(point, buck_size):
  return int(point[0] / buck_size, int(point[1] / buck_size)

def InsertPoint(grid, point, buck_size):
  bucket = Bucket(point, buck_size)
  grid[buck_size].append(point)

def Rehash(points, limit, buck_size):
  grid = collections.defaultdict(list)
  for first limit point in points:
    InsertPoint(grid, point, buck_size)
  return grid

# return new distance if point is closer than buck_size to any point in grid,
# otherwise return inf
def Probe(grid, point, buck_size):
  orig_bucket = Bucket(point)
  for delta_x in [-1, 0, 1]:
    for delta_y in [-1, 0, 1]:
      next_bucket = (orig_bucket[0] + delta_x, orig_bucket[1] + delta_y)
      for cand_point in grid[next_bucket]:  
        # there at most 2 points in any cell, otherwise we rehash 
        # and make smaller cells.
        if distance(cand_point, point) < buck_size):
          return distance(cand_point, point)
  return inf

def ClosestPair(points):
   random_shuffle(points)
   min_dist = distance(points[0], points[1])
   grid = Rehash(points, 2, min_dist)
   for i = 3 to n
     new_dist = Probe(points, i, grid)
     if new_dist != inf:
        # The key to the algorithm is this happens rarely when i is close to n,
        # and it's cheap when i is close to 0.
        grid = Rehash(points, i, new_dist)
        min_dist = new_dist
     else:
        InsertPoint(point, grid, new_dist)
   return min_dist

Each neighbor candidate search is O(1), done with a few hashes. The algorithm is expected to do O(log(n)) re-hashes, but each takes time proportional to i. The probability of needing to rehash is 2/i (== what is the chance that this particular point is the closest pair so far?), the probability that this point is in the closest pair after examining i points. Thus, the expected cost is

sum_i=2^n Prob[Rehash at step i] * Cost(rehash at i) + O(1) =

sum_i=2^n 2/i * i + O(1) =

sum_i=2^n 2 + O(1) =

O(n)

I'll just suggest a solution for a case where you have a list of numbers in a limited range.

If all the numbers are in a range [a,b] where O(b-a) < O(nlog(n)) you could do it in O(max(b-a,n)).

Create an array of size b-a. You can initiate it all to 0 (there's an algorithm to do this in O(1))

go over your list, and for each value x, put "1" in the cell in place x-a.

Then go over your new array, and count the distances between the cells that have a value.

This way you can find the minimal distance, and get the actual values by the index they have in the new array.