binary search middle value calculation_问答_开发者

开发者 https://www.devze.com 2023-02-01 10:17 出处：网络

The following is the pseudocode I got from a TopCoder tutorial about binary search binary_search(A, target):

The following is the pseudocode I got from a TopCoder tutorial about binary search

binary_search(A, target):
   lo = 1, hi = size(A)
   while lo <= hi:
      mid = lo + (hi-lo)/2
      if A[mid] == target:
         return mid            
      else if A[mid] < target: 
         lo = mid+1
      else:
         hi = mid-1

   // target was not found

Why do we calculate the middle value as mid = lo + (hi - lo) / 2 ? Whats wrong with (hi + lo) / 2

I have a slight idea that it might be to prevent overflows but I'm not sure, perhaps someone can explain it to me and if there are开发者_运维技巧 other reasons behind this.

Although this question is 5 years old, but there is a great article in googleblog which explains the problem and the solution in detail which is worth to share.

It's needed to mention that in current implementation of binary search in Java mid = lo + (hi - lo) / 2 calculation is not used, instead the faster and more clear alternative is used with zero fill right shift operator

int mid = (low + high) >>> 1;

Yes, (hi + lo) / 2 may overflow. This was an actual bug in Java binary search implementation.

No, there are no other reasons for this.

From later on in the same tutorial:

"You may also wonder as to why mid is calculated using mid = lo + (hi-lo)/2 instead of the usual mid = (lo+hi)/2. This is to avoid another potential rounding bug: in the first case, we want the division to always round down, towards the lower bound. But division truncates, so when lo+hi would be negative, it would start rounding towards the higher bound. Coding the calculation this way ensures that the number divided is always positive and hence always rounds as we want it to. Although the bug doesn't surface when the search space consists only of positive integers or real numbers, I've decided to code it this way throughout the article for consistency."

It is indeed possible for (hi+lo) to overflow integer. In the improved version, it may seem that subtracting lo from hi and then adding it again is pointless, but there is a reason: performing this operation will not overflow integer and it will result in a number with the same parity as hi+lo, so that the remainder of (hi+lo)/2 will be the same as (hi-lo)/2. lo can then be safely added after the division to reach the same result.

Let us assume that the array we're searching in, is of length INT_MAX. Hence initially:

high = INT_MAX 
low = 0

In the first iteration, we notice that the target element is greater than the middle element and so we shift the start index to mid as

low = mid + 1

In the next iteration, when mid is calculated, it is calculated as (high + low)/2 which essentially translates to INT_MAX + low(which is half of INT_MAX + 1) / 2

Now, the first part of this operation i.e. (high + low) would lead to an overflow since we're going over the max Int range i.e. INT_MAX

Because Unsigned right shift is not present in Go programming, To avoid integer overflow while calculating middle value in Go Programming language we can write like this.

mid := int(uint(lo+hi) >> 1)

Why question is answered but it is not easy to understand why solution works. So let's assume 10 is high and 5 is low. Assume 10 is highest value integer can have ( 10+1 will cause overflow ).

So instead of doing (10+5)/2 ≈ 7 ( because 10 + anything will lead overflow).

We do 5+(10-5)/2=> 5 + 2.5 ≈ 7