The situation is as follows:
- There are N arrays.
- In each array (0..N-1) there are (x,y) tuples (cartesian coordinates) stored
- The length of each array can be different
I want to extract the subset of coordinate combinations which make up a complete retangle of size N. In other words; all the cartesian coordinates are adjacent to each other.
Example:
findRectangles({
{*(1,1), (3,5), (6,9)},
{(9,4), *(2,2), (5,5)},
{(5,1)},
{*(1,2), (3,6)},
{*(2,1), (3,3)}
})
yields the following:
[(1,1),(1,2),(2,1),(2,2)],
...,
...(other solutions)...
No two points can come from the same set.
I first just calculated the cartesian product, but this quickly becomes infeasible (my use-case at the moment has 18 arrays of points with each array roughly con开发者_开发百科taining 10 different coordinates).
You can use hashing to great effect:
hash each point (keeping track of which list it is in)
for each pair of points (a,b) and (c,d):
if (a,d) exists in another list, and (c,b) exists in yet another list:
yield rectangle(...)
When I say exists
, I mean do something like:
hashesToPoints = {}
for p in points:
hashesToPoints.setdefault(hash(p),set()).add(p)
for p1 in points:
for p2 in points:
p3,p4 = mixCoordinates(p1,p2)
if p3 in hashesToPoints[hash(p3)] and {{p3 doesn't share a bin with p1,p2}}:
if p4 in hashesToPoints[hash(p4)] and {{p4 doesn't share a bin with p1,p2,p3}}:
yield Rectangle(p1,p2)
This is O(#bins^2 * items_per_bin^2)
~30000, which is downright speedy in your case of 18 arrays and 10 items_per_bin -- much better than the outer product approach which is... much worse with O(items_per_bin^#bins)
~3trillion. =)
minor sidenote:
You can reduce both the base and exponent in your computation by making multiple passes of "pruning". e.g.
remove each point that is not corectilinear with another point in the X or Y direction
then maybe remove each point that is not corectilinear with 2 other points, in both X and Y direction
You can do this by sorting according to the X-coordinate, repeat for the Y-coordinate, in O(P log(P))
time in terms of number of points. You may be able to do this at the same time as the hashing too. If a bad guy is arranging your input, he can make this optimization not work at all. But depending on your distribution you may see significant speedup.
Let XY be your set of arrays. Construct two new sets X and Y, where X equals XY with all arrays sorted to x-coordinate and Y equals XY with all arrays sorted to y-coordinate.
- For each point (x0,y0) in any of the arrays in X: find every point (x0,y1) with the same x-coordinate and a different y-coordinate in the remaining arrays from X
- For each such pair of points (if it exists): search Y for points (x1,y0) and (x1,y1)
Let C be the size of the largest array. Then sorting all sets takes time O(N*C*log(C)). In step 1, finding a single matching point takes time O(N*log(C)) since all arrays in X are sorted. Finding all such points is in O(C*N), since there are at most C*N points overall. Step 2 takes time O(N*log(C)) since Y is sorted.
Hence, the overall asymptotic runtime is in O(C * N^2 * log(C)^2).
For C==10 and N==18, you'll get roughly 10.000 operations. Multiply that by 2, since I dropped that factor due to Big-O-notation.
The solution has the further benefit of being extremely simple to implement. All you need is arrays, sorting and binary search, the first two of which very likely being built into the language already, and binary search being extremely simple.
Also note that this is the runtime in the worst case where all rectangles start at the same x-coordinate. In the average case, you'll probably do much better than this.
精彩评论