Most efficient way to search a sorted matrix?_问答_开发者

I have an assignment to write an algorithm (not in any particular language, just pseudo-code) that receives a matrix [size: M x N] that is sorted in a way that all of it's rows are sorted and all of it's columns are sorted individually, and finds a certain value within this matrix. I need to write the most time-efficient algorithm I can think of.

The matrix looks something like:

1  3  5
4  6  8
7  9 10

My idea is to start at the first row and last column and simply ch开发者_如何学Goeck the value, if it's bigger go down and if it's smaller than go left and keep doing so until the value is found or until the indexes are out of bounds (in case the value does not exist). This algorithm works at linear complexity O(m+n). I've been told that it's possible to do so with a logarithmic complexity. Is it possible? and if so, how?

Your matrix looks like this:

a ..... b ..... c
. .     . .     .
.   1   .   2   . 
.     . .     . .
d ..... e ..... f
. .     . .     .
.   3   .   4   .
.     . .     . .
g ..... h ..... i

and has following properties:

a,c,g < i  
a,b,d < e
b,c,e < f
d,e,g < h
e,f,h < i

So value in lowest-rigth most corner (eg. i) is always the biggest in whole matrix and this property is recursive if you divide matrix into 4 equal pieces.

So we could try to use binary search:

probe for value,
divide into pieces,
choose correct piece (somehow),
goto 1 with new piece.

Hence algorithm could look like this:

input: X - value to be searched
until found
 divide matrix into 4 equal pieces
 get e,f,h,i as shown on picture
 if (e or f or h or i) equals X then 
   return found
 if X < e then quarter := 1
 if X < f then quarter := 2
 if X < h then quarter := 3
 if X < i then quarter := 4
 if no quarter assigned then 
    return not_found
 make smaller matrix from chosen quarter

This looks for me like a O(log n) where n is number of elements in matrix. It is kind of binary search but in two dimensions. I cannot prove it formally but resembles typical binary search.

and that's how the sample input looks? Sorted by diagonals? That's an interesting sort, to be sure.

Since the following row may have a value that's lower than any value on this row, you can't assume anything in particular about a given row of data.

I would (if asked to do this over a large input) read the matrix into a list-struct that took the data as one pair of a tuple, and the mxn coord as the part of the tuple, and then quicksort the matrix once, then find it by value.

Alternately, if the value of each individual location is unique, toss the MxN data into a dictionary keyed on the value, then jump to the dictionary entry of the MxN based on the key of the input (or the hash of the key of the input).

EDIT:

Notice that the answer I give above is valid if you're going to look through the matrix more than once. If you only need to parse it once, then this is as fast as you can do it:

for (int i = 0; i<M; i++)
 for (int j=0; j<N; j++)
  if (mat[i][j] == value) return tuple(i,j);

Apparently my comment on the question should go down here too :|

@sagar but that's not the example given by the professor. otherwise he had the fastest method above (check the end of the row first, then proceed) additionally, checking the end of the middlest row first would be faster, a bit of a binary search.

Checking the end of each row (and starting on the end of the middle row) to find a number higher than the checked for number on an in memory array would be fastest, then doing a binary search on each matching row till you find it.

in log M you can get a range of rows able to contain the target (binary search on the first value of rows, binary search on last value of rows, keep only those rows whose first <= target and last >= target) two binary searches is still O(log M)
then in O(log N) you can explore each of these rows, with again, a binary search!

that makes it O(logM x logN)
tadaaaa

public static boolean find(int a[][],int rows,int cols,int x){
    int m=0;
    int n=cols-1;
while(m<rows&&n>=0){
    if(a[m][n]==x)
        return1;
    else if(a[m][n]>x)
        n--;
    else m++;
}

}

what about getting the diagonal out, then binary search over the diagonal, start bottom right check if it is above, if yes take the diagonal array position as the column it is in, if not then check if it is below. each time running a binary search on the column once you have a hit on the diagonal (using the array position of the diagonal as the column index). I think this is what was stated by @user942640

you could get the running time of the above and when required (at some point) swap the algo to do a binary search on the initial diagonal array (this is taking into consideration its n * n elements and getting x or y length is O(1) as x.length = y.length. even on a million * million binary search the diagonal if it is less then half step back up the diagonal, if it is not less then binary search back towards where you where (this is a slight change to the algo when doing a binary search along the diagonal). I think the diagonal is better than the binary search down the rows, Im just to tired at the moment to look at the maths :)

by the way I believe running time is slightly different to analysis which you would describe in terms of best/worst/avg case, and time against memory size etc. so the question would be better stated as in 'what is the best running time in worst case analysis', because in best case you could do a brute linear scan and the item could be in the first position and this would be a better 'running time' than binary search...

Here is a lower bound of n. Start with an unsorted array A of length n. Construct a new matrix M according to the following rule: the secondary diagonal contains the array A, everything above it is minus infinity, everything below it is plus infinity. The rows and columns are sorted, and looking for an entry in M is the same as looking for an entry in A.

This is in the vein of Michal's answer (from which I will steal the nice graphic).

Matrix:

  min ..... b ..... c
    .       .       .
    .  II   .   I   . 
    .       .       .
    d .... mid .... f
    .       .       .
    .  III  .   IV  .
    .       .       .
    g ..... h ..... max

Min and max are the smallest and largest values, respectively. "mid" is not necessarily the average/median/whatever value.

We know that the value at mid is >= all values in quadrant II, and <= all values in quadrant IV. We cannot make such claims for quadrants I and III. If we recurse, we can eliminate one quadrant at each level.

Thus, if the target value is less than mid, we must search quadrants I, II, and III. If the target value is greater than mid, we must search quadrants I, III, and IV.

The space reduces to 3/4 its previous at each step:

n * (3/4)^x = 1

n = (4/3)^x

x = log_4/3(n)

Logarithms differ by a constant factor, so this is O(log(n)).

find(min, max, target)
    if min is max
        if target == min
            return min
        else
            return not found
    else if target < min or target > max
        return not found
    else
        set mid to average of min and max 
        if target == mid
            return mid
        else
            find(b, f, target), return if found
            find(d, h, target), return if found

            if target < mid
                return find(min, mid, target)
            else
                return find(mid, max, target)

JavaScript solution:

//start from the top right corner
//if value = el, element is found
//if value < el, move to the next row, element can't be in that row since row is sorted
//if value > el, move to the previous column, element can't be in that column since column is sorted

function find(matrix, el) {

  //some error checking
  if (!matrix[0] || !matrix[0].length){
    return false;
  }
  if (!el || isNaN(el)){
    return false;
  }

  var row = 0; //first row
  var col = matrix[0].length - 1; //last column

  while (row < matrix.length && col >= 0) {
    if (matrix[row][col] === el) { //element is found
      return true;
    } else if (matrix[row][col] < el) {
      row++; //move to the next row
    } else {
      col--; //move to the previous column
    }
  }

  return false;

}

this is wrong answer

I am really not sure if any of the answers are the optimal answers. I am going at it.

binary search first row, and first column and find out the row and column where "x" could be. you will get 0,j and i,0. x will be on i row or j column if x is not found in this step.
binary search on the row i and the column j you found in step 1.

I think the time complexity is 2* (log m + log n).

You can reduce the constant, if the input array is a square (n * n), by binary searching along the diagonal.