Finding the maximum area in given binary data_问答_开发者

I have a problem with describing algorithm for finding maximum rectangular area of binary data, where 1 occurs k-times more often than 0. Data is always n^2 bits like this:

For example data for n = 4 looks like:

1 0 1 0

0 0 1 1

0 1 1 1

1 1 0 1

Value of k can be 1 .. j (k = 1 means, that number of 0 and 1 is equa开发者_如何学运维l).

For above example of data and for k = 1 solution is:

1 0 1 0 <- 4 x '0' and 4 x '1'

0 0 1 1

0 1 1 1

1 1 0 1

But in this example:

1 1 1 0

0 1 0 0

0 0 0 0

0 1 1 1

Solution would be:

1 1 1 0

0 1 0 0

0 0 0 0

0 1 1 1

I tried with few brute force algorithms but for n > 20 it is getting too slow. Can you advise me how I should solve this problem?

As RBerteig proposed - the problem can be also described like that: "In a given square bitmap with cells set to 1 or 0 by some arbitrary process, find the largest rectangular area where the 1's and 0's occur in a specified ratio, k."

Bruteforce should do just fine here for n < 100, if properly implemented: solution below has O(n^4) time and O(n^2) memory complexity. 10^8 operations should be well under 1 second on modern PC (especially considering that each operation is very cheap: few additions and subtractions).

Some observations

There're O(n^4) sub-rectangles to consider and each of them can be a solution.
If we can find number of 1's and 0's in each sub-rectangle in O(1) (constant time), we'll solve problem in O(n^4) time.
If we know number of 1's in some sub-rectangle, we can find number of zeroes (through area).

So, the problem is reduced to following: create data structure allowing to find number of 1's in each sub-rectangle in constant time.

Now, imagine we have sub-rectangle [i0..i1]x[j0..j1]. I.e., it occupies rows between i0 and i1 and columns between j0 and j1. And let count_ones be the function to count number of 1's in subrectangle.

This is the main observation:

count_ones([i0..i1]x[j0..j1]) = count_ones([0..i1]x[0..j1]) - count_ones([0..i0 - 1]x[0..j1]) - count_ones([0..i1]x[0..j0 - 1]) + count_ones([0..i0 - 1]x[0..j0 - 1])

Same observation with practical example:

AAAABBB
AAAABBB
CCCCDDD
CCCCDDD
CCCCDDD
CCCCDDD

If we need to find number of 1's in D sub-rectangle (3x4), we can do it by taking number of 1's in the whole rectangle (A + B + C + D), subtracting number of 1's in (A + B) rectangle, subtracting number of 1's in (A + C) rectangle, and adding number of 1's in (A) rectangle. (A + B + C + D) - (A + B) - (A + C) + (A) = D

Thus, we need table sums, for each i and j containing number of 1's in sub-rectangle [0..i][0..j].
You can create this table in O(n^2), but even the direct way to fill it (for each i and j iterate all elements of [0..i][0..j] area) will be O(n^4).

Having this table,

count_ones([i0..i1]x[j0..j1]) = sums[i1][j1] - sums[i0 - 1][j1] - sums[i1][j0 - 1] + sums[i0 - 1][j0 - 1]

Therefore, time complexity O(n^4) reached.

This is still brute force, but something you should note is that you don't have to recompute everything from scratch for a new i*j rectangle. Instead, for each possible rectangle size, you can move the rectangle across the n*n grid one step at a time, decrementing the counts for the bits no longer within the rectangle and incrementing the counts for the bits that newly entered the rectangle. You could potentially combine this with varying the rectangle size, and try to find an optimal pattern for moving and resizing the rectangle.

Just some hints..

You could impose better restrictions on the values. The requirement leads to condition

N1*(k+1) == S*k, where N1 is number of ones in an area, and S=dx*dy is its surface. It can be rewritten in better form:

N1/k == S/(k+1).

Because the greatest common divisor of numbers n and n+1 is always 1, then N1 have to be multiple of k and dx*dy to be multiple of k+1. It reduces greatly the possible space of solutions, the larger is k, the better (for dx*dy case you'll need to play with prime divisors of k+1).

Now, because you need just the surface of the largest area with such property, it would be wise to start from largest areas and move to smaller ones. By trying dx*dy from n^2 downto k+1 that would satisfy the divisor and the bounding conditions, you'll find quite fast the solution, muuuch faster than O(n^4), because of a special reason: except cases when the array was specially constructed, if we assume a random input, the probability that there are N1 ones out of S values in the (n-dx+1)*(n-dy+1) areas that have the surface S will constantly grow with decrease of S. (large values of k will make the probability smaller, but in the same time they will make the filter for dx and dy pairs stronger).

Also, this problem: http://ioinformatics.org/locations/ioi99/contest/land/land.shtml , looks somehow similar, maybe you'll find some ideas in their solution.