In writing some code today, I have happened upon a circumstance that has caused me to write a binary search of a kind I have never seen before. Does this binary search have a name, and is it really a "binary" search?
Motivation
First of all, in order to make the search easier to understand, I will explain the use case that spawned its creation.
Say you have a list of ordered numbers. You are asked to find the index of the number in the list that is closest to x.
int findIndexClosestTo(int x);
The calls to findIndexClosestTo()
always follow this rule:
If the last result of
findIndexClosestTo()
wasi
, then indices closer toi
have greater probability of being the result of the current call tofindIndexClosestTo()
.
In other words, the index we need to find this time is more likely to be closer to the last one we found than further from it.
For an example, imagine a simulated boy that walks left and right on the screen. If we are often querying the index of the boy's location, it is likely he is somew开发者_如何学Pythonhere near the last place we found him.
Algorithm
Given the case above, we know the last result of findIndexClosestTo()
was i
(if this is actually the first time the function has been called, i
defaults to the middle index of the list, for simplicity, although a separate binary search to find the result of the first call would actually be faster), and the function has been called again. Given the new number x
, we follow this algorithm to find its index:
interval = 1;
- Is the number we're looking for,
x
, positioned ati
? If so, returni
; - If not, determine whether
x
is above or belowi
. (Remember, the list is sorted.) - Move
interval
indices in the direction ofx
. - If we have found
x
at our new location, return that location. - Double
interval
. (i.e.interval *= 2
) - If we have passed
x
, go backinterval
indices, setinterval = 1
, go to 4.
Given the probability rule stated above (under the Motivation header), this appears to me to be the most efficient way to find the correct index. Do you know of a faster way?
In the worst case, your algorithm is O((log n)^2).
Suppose you start at 0 (with interval = 1), and the value you seek actually resides at location 2^n - 1.
First you will check 1, 2, 4, 8, ..., 2^(n-1), 2^n. Whoops, that overshoots, so go back to 2^(n-1).
Next you check 2^(n-1)+1, 2^(n-1)+2, ..., 2^(n-1)+2^(n-2), 2^(n-1)+2^(n-1). That last term is 2^n, so whoops, that overshot again. Go back to 2^(n-1) + 2^(n-2).
And so on, until you finally reach 2^(n-1) + 2^(n-2) + ... + 1 == 2^n - 1.
The first overshoot took log n steps. The next took (log n)-1 steps. The next took (log n) - 2 steps. And so on.
So, worst case, you took 1 + 2 + 3 + ... + log n == O((log n)^2) steps.
A better idea, I think, is to switch to traditional binary search once you overshoot the first time. That will preserve the O(log n) worst case performance of the algorithm, while tending to be a little faster when the target really is nearby.
I do not know a name for this algorithm, but I do like it. (By a bizarre coincidence, I could have used it yesterday. Really.)
What you are doing is (IMHO) a version of Interpolation search
In a interpolation search you assume numbers are equally distributed, and then you try to guess the location of a number from first and last number and length of the array.
In your case, you are modifying the interpolation-algo such that you assume the Key is very close to the last number you searched.
Also note that your algo is similar to algo where TCP tries to find the optimal packet size. (dont remember the name :( )
- Start slow
- Double the interval
- if Packet fails restart from the last succeeded packet./ Restart from default packet size.. 3.
Your routine is typical of interpolation routines. You don't lose much if you call it with random numbers (~ standard binary search), but if you call it with slowly increasing numbers, it won't take long to find the correct index.
This is therefore a sensible default behavior for searching an ordered table for interpolation purposes.
This method is discussed with great length in Numerical Recipes 3rd edition, section 3.1.
This is talking off the top of my head, so I've nothing to back it up but gut feeling!
At step 7, if we've passed x, it may be faster to halve interval
, and head back towards x
- effectively, interval = -(interval / 2)
, rather than resetting interval
to 1.
I'll have to sketch out a few numbers on paper, though...
Edit: Apologies - I'm talking nonsense above: ignore me! (And I'll go away and have a proper think about it this time...)
精彩评论