How to process a integer matrix to find the average for the elements of any sub-rectangle in O(1)?_问答_开发者

This is an interview question: How to process a integer matrix to find the average for the elements of any sub-rectangle in O(1)? As the comments say, we should calculate the prefix sums.

Well, it is embarrassing but even having asked a similar quest开发者_如何转开发ion I did not get it.

Though you have a couple of reasonable answers, this seems like a question for which a picture is worth several thousand words.

How to process a integer matrix to find the average for the elements of any sub-rectangle in O(1)?

Okay -- what we want is the area of rectangle 4. Our pre-computed area, however, only tells use the area of the whole rectangle starting from the origin at the top left, and extending to the bottom-right corer of 4. To compute only the area of 4, we have to subtract the areas above and to the left (1, 2, & 3). The areas of 2 & 3 are the same way though: we can only get their areas starting from the top left and extending to their bottom right. That means we can get the sum of their areas, but the area of rectangle 1 is included in both of those.

So, to get the area of rectangle 4, we find the area at its bottom right corner. We find the area at the bottom-right corners of 2 and 3 and add them together. We then subtract the area at the bottom-right corner of rectangle 1 since it was included in our total twice. That gives us the total area of rectangles 1, 2 and 3. We subtract that from the area for the bottom-right corner of rectangle 4, and that gives us the area of rectangle 4 by itself.

Just for example, let's assume I managed to get all those rectangles the same size, so we'll call the area for each '1'. So, the area given at the bottom-right corner of each rectangle will be:

To keep things simple, let's define each of these as pc_area(x) (i.e. precomputed-area we'd get at the bottom-right corner of rectangle x).

So, we take: pc_area(4) - (pc_area(2) + pc_area(3) - pc_area(1)) Substituting the numbers, we get: 4 - (2 + 2 -1), or 4-3, which obviously gives 1, the answer we originally defined as correct.

See @Sven Marnach'answer to the post @Michael linked to. The prefix sum algorithm he outlines is based on the following idea: if you have precomputed the sums of the following element sequences: [0, 0], [0, 1], [0, 2], ..., [0, n-1], you can compute the sum of the sequence [a, b] in constant time as [0, b] - [0, a-1]. The same idea can be applied to two-dimensional arrays. Denote the subarray with its upper left corner at (a, b) and its lower right corner at (c, d) as [(a, b), (c, d)]. We will treat such a subarray similarly to how we treated sequences in the 1-D case. Precompute the sums of all subarrays having their upper left corner at (0, 0): [(0, 0), (0, 0)], [(0, 0), (0, 1)], [(0, 0), (0, 2)], ..., [(0, 0), (0, m-1)], [[(0, 0), (1, 0)], [(0, 0), (1, 1)], ..., [(0, 0), (1, m-1)], ..., [(0, 0), (n-1, m-1)]. There are n*m such subarrays, and the sum of each can be computed in constant time based on the sums of smaller subarrays. If we are now asked to produce the sum of the subarray [(a, b), (c, d)], we can find it as [(0, 0), (c, d)] - [(0, 0), (a-1, d)] - [(0, 0), (c, b-1)] + [(0, 0), (a-1, b-1)]. Draw it on paper, and you'll see how the subarrays overlap and why we need to add the last subarray.

I think you are misleaded by the assumption that you just get a matrix / list of nubmers.

In this case the requirement is not possible. The fastest you could do is an heuristic algorithm that is based on a sorting algorithm and you could achieve O(Log(n)).

This however will only be useful if there is an error tolerance and the distribution of values is not distored.

But you can very well solve your problem if you have control over the data structure. The common teqnique to do that is called prefix sums.

read up on it on wikipedia, they explain it better than i ever could.

But let me provide a very basic trivial analogy that may solve confusion.

You cannot search an element in a list in O(1). Log(n) is the best you can do, if its sorted or you sort it before.

However if you have control over the data structure you can build a hash index where you can directly pick it and have O(1).

Prefix sums is something similar. Its a datastructure that solves your problem, not an algorithm (well not of course its an algorithm but more on the other side of the line).