First occurrence in a binary search_问答_开发者

I'm tinkering with some code and I realized something I never knew. A normal binary search will return a random index in a data set for a key that occurs more than once. How can I modify this code below to return the first occurrence? Is this something people do?

//ripped from the JDK
public static int binarySearchValu开发者_JAVA技巧e(InvertedContainer.InvertedIndex[] a, long key) {
    return bSearchVal(a, 0, a.length, key);
}

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                                 int toIndex, long key) {
    int low = fromIndex;
    int high = toIndex - 1;

    while (low <= high) {
        int mid = (low + high) >>> 1;
        long midVal = a[mid].val;

        if (midVal < key)
            low = mid + 1;
        else if (midVal > key)
            high = mid - 1;
        else
            return mid; // key found
    }
    return (low); // key not found. return insertion point
}

An addition to Jon Skeets post:

The potential faster implementation is actually not hard to implement and adds only 2 lines of code, here is how I'd do it:

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else if (low != mid) //Equal but range is not fully scanned
        high = mid; //Set upper bound to current number and rescan
    else //Equal and full range is scanned
        return mid;

Having found a matching value, you basically need to walk up the collection until you find an entry which doesn't match.

You could potentially make it faster by fetching the index of a key immediately lower than the one you were looking for, then do a binary chop between the two - but I'd probably go for the simpler version, which is likely to be "efficient enough" unless you've got a really large number of equal entries.

If your data is all integral, then this hack can help. It uses a float array to store values.

float array[];    //contains all integral values
int searchValue;

int firstIndex = -(binarySearch(array, (float)searchValue - 0.5F) + 1);

Basically what it does is find the insertion index of a value in between your search value and the integer before it. Since all values are integral, it finds the first occurrence of the search value.

Also this runs is log(n) time.

Example:

import java.util.Arrays;

public class BinarySearch {
    // considering array elements are integers
    float ar[] = new float[] { 1, 2, 3, 3, 4, 4, 5, 9, 9, 12, 12 };

    public void returnFirstOccurrence(int key) {
        int firstIndex = -(Arrays.binarySearch(ar, key - 0.5F) + 1);
        if (ar[firstIndex] != key)
            System.out.println("Key doesn't exist");
        else
            System.out.println("First index of key is " + firstIndex);
    }

    public static void main(String Args[]) throws Exception {
        new BinarySearch().returnFirstOccurrence(9);
    }

}

OUTPUT: 7

p.s: I've used this trick on several coding contests and it nicely worked everytime.

You could implement "lower bound" algorithm instead of binary search. This algorithm is used e.g. in C++/STL and its transcript into Java is straightforward. The algorithmic complexity of lower bound is also O(log n) as the binary search. This is better than to use binary search first and than linearly search for the first matching element - this would have worst case behaviour O(n).

You can an adapt your existing search algorithm just by having a sharper definition of matching. You can tell that the highlighted 5 in the sequence 1,3,5,5,5,9 is the first one because the number before it (3) is smaller than 5. So if mid lands on array element equal to the the key you only treat it as a match if a[mid-1] is less than key, other equal array elements are treated like greater than elements. Now you algorithm becomes (after including Jon Skeet's suggestion to return negatives for insertion points):

public static int binarySearch(int[] a, int key) {
    int low=0,high=a.length-1;
    while (low<=high) {
        int mid=(low+high) >>> 1;
        int midVal=a[mid];
        if (midVal < key) 
            low=mid+1;
        else if (mid>0 && a[mid-1]>=key) //we already know midval>=key here
            high=mid-1;
        else if (midVal==key) //found the 1st key 
             return mid;
        else
            return ~mid;      //found insertion point
    }
    return ~(a.length);       //insertion point after everything
}

It uses more comparisons but went faster than Stev314's version in my benchmarks probably because of cache effects.

The following algorithm binary-searches for the first item with a key greater-than-or-equal-to your search key...

while (upperbound > lowerbound)
{
  testpos = lowerbound + ((upperbound-lowerbound) / 2);

  if (item[testpos] >= goal)
  {
    //  new best-so-far
    upperbound = testpos;
  }
  else
  {
    lowerbound = testpos + 1;
  }
}

This isn't written for Java, which I don't know that well, so it may need a minor tweak. Note that the bounds are half-open (lowerbound is inclusive and upperbound is exclusive) and that this is important for correctness.

This can be adapted to other similar searches - e.g. finding the last key <= the search value.

This is slightly modified from my earlier question-and-answer here.

here is the solution, i found for getting the lower index of key having multiple occurrences in a sorted array using binary search.

int lowerBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]< key) low=mid;
        else high=mid;
    }
    int p = high;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}

we have to change a little in this code to work for upper bound, using binary search, so here is the working copy of code.

 int upperBound(int[] array,int fromIndex, int toIndex, int key)
{
    int low = fromIndex-1, high = toIndex;
    while (low+1 != high)
    {
        int mid = (low+high)>>>1;
        if (array[mid]> key) high=mid;
        else low=mid;
    }
    int p = low;
    if ( p >= toIndex || array[p] != key )
        p=-1;//no key found
    return p;
}

One approach is to hold an invariant throughout the whole binary search. In your particular case, the invariant would be:

array[low] < key
key <= array[high]

Then you can minimize the gap between low and high using binary search. When low + 1 == high, high would be the answer. Example code in C++:

// check invariant on initial values.
if (array[low] >= key) return low;
if (array[high] < key) return high+1;
// low + 1 < high ensures high is at least low + 2, thus
// mid will always be different from low or high. It will
// stop when low + 1 == high.
while (low + 1 < high) {
  int mid = low + (high - low) / 2;
  if (array[mid] < key) {
    low = mid;   // invariant: array[low] < key
  } else {
    high = mid;  // invariant: array[high] >= key
  }
}
return high;

Key difference between this and your example code is to update low and high to only mid rather than mid+1 or mid-1, because we have checked the value of array[mid], we can guarantee the invariant still holds when updating the boundaries. You need to check the invariant on initial values before starting to search too.

In this thread, you can find a full example of the binary search (recursive version), and two other versions (based on the original one) which allow you to get the first index and last index of a given key.

For your convenience I added the relevant Junit tests.

Here's a variation of the solution in scala. Used pattern-matching and recursion instead of the while loop to get the first occurrence.

def binarySearch(arr:Array[Int],key:Int):Int = {
     def binSearchHelper(lo:Int,hi:Int,mid:Int):Int = {
        if(lo > hi) -1 else {
            if(arr(mid) == key) mid else if(arr(mid) > key){
                binSearchHelper(lo,mid-1,lo + (((mid-1) - lo)/2))
            }else{
                binSearchHelper(mid+1,hi,(mid+1) + ((hi - (mid+1))/2))
            }
        }
     }
    binSearchHelper(0,arr.size-1,(arr.size-1)/2)
}

def findFirstOccurrence(arr:Array[Int],key:Int):Int = {
    val startIdx = binarySearch(arr,key)
    startIdx match {
        case 0 => 0
        case -1 => -1
        case _ if startIdx > 0 => {
            if(arr(startIdx - 1) < key) startIdx else {
                    findFirstOccurrence(arr.slice(0,startIdx),key)
            }
        }
    }
}

This should do the trick

private static int bSearchVal(InvertedContainer.InvertedIndex[] a, int fromIndex,
                             int toIndex, long key) {
int low = fromIndex;
int high = toIndex - 1;
int result = low;
while (low <= high) {
    int mid = (low + high) >>> 1;
    long midVal = a[mid].val;

    if (midVal < key)
        low = mid + 1;
    else if (midVal > key)
        high = mid - 1;
    else
    {
        result = mid;
        high = mid -1; 
    }
}
return result;

}

For the last occurrence of the element:

static int elementExists(int input[], int element){
    int lo=0;
    int high = input.length-1;
    while(lo<high){
        int mid = (lo + high )/2;
        if(element >input[mid] ){
            lo = mid+1;
        }
        else if(element < input[mid]){
            high= mid-1;
        }
        else if (high != input.length-1) //Change for the Occurrence check
            lo = mid;
        else {
            return mid;
        }
    }
    return -1;
}

For the first occurrence:

else if (lo != mid){
        high = mid;
}

I think a simpler approach is storing the latest mid index where xs[mid] == key into a result variable and then keep running the binary search.

Here's the swift code:

func first<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { hi = mid - 1; res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}

Also, this requires a really small change (just one line) if you were to find the last index of a key.

func last<T: Comparable>(xs: [T], key: T) -> Int {
    var lo = xs.startIndex
    var hi = xs.endIndex - 1
    var res = -1
    while lo <= hi {
        let mid = lo + (hi - lo) >> 1
        if xs[mid] == key { lo = mid + 1;  res = mid }
        else if xs[mid] < key { lo = mid + 1}
        else if xs[mid] > key { hi = mid - 1 }
    }

    return res
}

Try this javascript recursive solution. It's optimal in a sense that it's O(log(N))

function solve(A, e) {
  function solve (A, start, end, e, bestUntilNow) {
    if (start > end) {
      if (A[start] === e)
        return start
      return bestUntilNow
    } else {
      const mid = start + Math.floor((end - start) / 2)
      if (A[mid] === e) {
        return solve(A, start, mid - 1, e, mid)
      } else if (e < A[mid]) {
        return solve(A, start, mid - 1, e, bestUntilNow)
      } else {
        return solve(A, mid + 1, end, e, bestUntilNow)
      }
    }
  }
  return solve(A, 0, A.length, e, -1)
}

Given a sorted array with possibly duplicated elements like [1,1,2,2,2,2,3], the following code with time complexity O(logn) finds the indexes of the first and last occurrences of an element in the given array. This approach is based on a recursive JS Binary Search implementation by comparing with the immediately lower or immediately higher index/element after matching initially the element (as it is also suggested below).

// Find the first occurence of the value using binary search
function binarySearchFirstOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle-1] < value || middle === 0)) {
    return middle;
  } else if (arr[middle] < value) {
    return binarySearchFirstOccurence(arr, middle + 1, right, value);
  } else {
    // Going lower
    return binarySearchFirstOccurence(arr, left, middle - 1, value);
  }
}

// Find the last occurence of the value using binary search
function binarySearchLastOccurence(arr, left, right, value) {
  let middle = Math.floor((left+right)/2);
  if (left > right) {
    return -1;
  } else if (arr[middle] === value && (arr[middle+1] > value || middle === arr.length - 1)) {
    return middle;
  } else if (arr[middle] > value) {
    return binarySearchLastOccurence(arr, left, middle - 1, value);
  } else {
    // Going higher
    return binarySearchLastOccurence(arr, middle + 1, right, value);
  }
}

function sortedFrequency(arr, value) {
  let left = 0;
  let right = arr.length -1;
  let first = binarySearchFirstOccurence(arr, left, right, value);
  let last = binarySearchLastOccurence(arr, left, right, value);
  if (first === -1 && last === -1) {
    return -1;
  } else if (first === -1 && last !== -1) {
    return 1;
  } else {
    return last-first+1;
  }
}

let arr = [1,1,2,2,2,2,3];
console.log(sortedFrequency(arr, 3));

Best George

The solution is doing some changes in the binary search so that time complexity remains equal to O(log2n). Keep in mind that for binary search the array must be in sorted form. Following are the steps

Calculate the mid index
If the element to be searched is greater than element present at the mid index then low becomes mid + 1
If the element to be searched is smaller than element present at the mid index then high will become mid - 1.
Now lets make some changes. Suppose if two elements are same in an array so our aim is to return the first occurrence of that element. Case would be if mid index == 0 or arr[mid - 1] != arr[mid], then mid index is returned. This means that same element is not repeated and if arr[mid - 1] == arr[mid], then one will make high = mid - 1. The reason being one need to return the previous element as it is the first occurrence.

Code

def first_occurence_binary_search(arr, search_element):

n = len(arr)
low = 0
high = n - 1

while low <= high:

    mid = (low + high) // 2
    if arr[mid] > search_element:
        high = mid - 1

    elif arr[mid] < search_element:
        low = mid + 1

    else:
        if mid == 0 or arr[mid - 1] != arr[mid]:
            return mid
        else:
            high = mid - 1

return -1


array = [1, 3, 3, 3]
searching = 3
print(first_occurence_binary_search(arr=array,
                                    search_element=searching))