开发者

Confused on claim in CLRS randomly built binary search tree proof

开发者 https://www.devze.com 2023-02-27 11:28 出处:网络
Not sure if I should put this on开发者_运维知识库 math stackexchange instead, but oh well. On page 300 of CLRS...

Not sure if I should put this on开发者_运维知识库 math stackexchange instead, but oh well.

On page 300 of CLRS...

Theorem 12.4
The expected height of a randomly built binary search tree on n distinct keys is O(lgn).

They define 3 random variables...

'Xn' is the height of a randomly built binary search on n keys.
'Yn' is the "exponential height", where Yn = 2^(Xn)
'Rn' is the position that the root key would occupy if the key's were sorted, its rank.

And indicator random variables Zn,1 Zn,2 Zn,3 ... Zn,n...

'Zn,i = I{Rn = i}'

So they go on to make the proof (see the text), but in the midst of it they make the following claim...

We claim that the indicator random variable Zn,i = I{Rn = i} is independent of the
values of Yi-1 and Yn-i. Having chosen Rn = i, the left subtree (whose exponential
height is Yi-1) is randomly built on the i-1 keys whose ranks are less than i. This
subtree is just like any other randomly built binary search tree on i-1 keys.
Other than the number of keys it contains, this subtree's structure is not affected
at all by the choice of Rn = i, and hence the random variables Yi-1 and Zn,i are
independent.

Likewise for Yn-i. My issue is that part, Other than the number of keys it contains... Yes, the structures of the subtrees are unaffected by Rn, but the fact that Rn affects the number of keys in the subtrees seems to imply a dependence due to how it limits the height of the subtrees.

I'm obviously missing some key relationship. Any help is appreciated, thanks.


For the expected time proof, you can think of every insertion to be an independent event. The inserter is not adversarial (i.e. doesn't try to break your binary tree). Now, if this is truly random, then you can consider every other (every odd or every even) value to be inserted as being inserted as a bad node. Bad nodes are ones that cause the tree to increase in height. Bad nodes have good nodes between them.

If you already have a tree of height 'h' (consider it has O(2^h) nodes), then it will have O(2^(h-1)) nodes as leaves (approximately half the total nodes are leaves). There is an equal probability of a new value going anywhere (i.e. as a child of any of those nodes). It is expected that half of them will become left child of a leaf (increasing the height of the leaf node by 1) and the other half will become the right child of that leaf. This gives an expected O(log n) height to the tree. Hence, every operation on such a tree costs O(log n).

0

精彩评论

暂无评论...
验证码 换一张
取 消