The order of b-trees_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-05 07:04 出处：网络

I\'m studying for an exam, and came up on B-trees. Wikipedia describes a B-tree as a tree where the nodes have at least d and at most 2d keys and therefore at most 2d+1 leafs. For example if d=1, it w

I'm studying for an exam, and came up on B-trees. Wikipedia describes a B-tree as a tree where the nodes have at least d and at most 2d keys and therefore at most 2d+1 leafs. For example if d=1, it would have a maximum of 2 keys, and 3 children, making it a 2-3 tree. However this wouldn't allow for example a 2-3-4 tree unless I'开发者_开发问答m mistaken.

However our material describes a b-tree as a tree where each node has at least t>=2 t-1 keys and at most 2t-1 keys. This would mean that the nodes have an odd number of keys and an even number of children. For example t=2 would have from 1 to 3 keys, and up to 4 children, making it a 2-3-4 tree. On the other hand there couldn't be a 2-3 tree with this notation.

On top of this, there is a notation by Knuth where the d would mean the maximum number of children in a node. This notation would allow both even and odd number of children, allowing both 2-3 trees and 2-3-4 trees.

I know both 2-3 trees and 2-3-4 trees exist.

What is the real notation? Is there a real notation? As an extra question; what is the maximun number of keys in a tree of size h?

If you read the wiki article a little more closely you'll see that there are two main variants of B-trees (excluding structural variants like B* and B+), one with t -> 2t keys, and one with t -> 2t+1 keys.

Translating those variants to #children we have one with t+1 -> 2t+1 children, and one with t+1 -> 2t+2 children.

So essentially to answer your question, both 2-3 and 2-3-4 trees are valid trees, each according to a different variant/definition:

2-3 is of the first kind (t+1 -> 2t+1 children where t=1)

2-3-4 is of the second kind (t+1 -> 2t+2 children where t=1)

The validity of both variants stems from the fact that both splits and merges (actions done on delete and insert from/into the ADT) are valid:

t -> 2t:

Split. Happens when we add a new element and a node has more than the max number of keys 2t So we have 2t+1 keys, we split the node into two nodes, and push one element to the parent, leaving 2t keys in the two children, and t keys in each child. This is acceptable because the minimum number of keys in a node is indeed t.

Merge. Happens when we delete a key and a node contains less than the minimum number, t, and it's sibling is also at the minimum. So we have t-1 + t keys in our new merged node, the resulting node must be valid: t-1 + t = 2t-1 <= 2t. Great.

So too with t -> 2t+1:

Split. 2t+2 becomes t and t+1 which is OK.

Merge. t-1 + t = 2t-1 <= 2t+1

Of course there is no difference in running times, it's just a slight implementation difference of little theoretical importance (you can slightly optimise some algorithms with the first variant, but not so much that it will change run-time complexities).

search in google scholar for b tree comer => Ubiquitous B-Tree, Comer, 1979

This is the most cited paper which you find in data structure papers. This paper describes the b tree in detail (how it works, complexity and it variants...). There you should find your answer.

I hope this helps

p.s. cite that paper in the exam if you use a different formula than the taught one :P