I understand how binary search trees are implemented, but I am not sure what are the advantages of using it over the hash tables that most p开发者_运维知识库rogramming languages have built into their standard libraries.
Could someone please provide examples of real-world problems solvable with binary search trees?
There are a few theoretical advantages of binary search trees over hash tables:
They store their elements in sorted order. This means that if you want to store the container in a way where you can easily visit the values in sorted order, a BST is probably a better choice than a hash table. For example, if you want to store a collection of students and then print out all the students in alphabetical order, a BST is a substantially better choice than a hash table.
They efficiently support range queries. Because BSTs are stored in sorted order, it's easy to answer questions of the form "what values are in the range [x, y]?" in a binary search tree. To do this, you do a lookup in the tree for the smallest element greater than x and the greatest element smaller than y, then iterate over the elements of the tree between them. Both of these queries run in O(lg n) time in a balanced tree, so the total runtime for this operation is O(lg n + k), where k is the number of elements matching the query.
They efficiently support nearest-neighbor queries. Hash tables are specifically designed so that even slightly different produce wildly different hash codes. This gives the hash values the dispersion they need to avoid clustering too many elements in one spot. However, it also means that you need to do a linear scan over the hash table to find elements that might be "close" to what you're looking for. With a BST, you can efficiently find the predecessor and successor of any value you'd like, even if it's not in the tree.
They can have better worst-case guarantees. Most hash table implementations have some sort of degenerate case in which an operation can degrade to O(n) in the worst-case. A linear probing hash table or a chained hash table can, with a bad set of elements, require O(n) time per lookup or require O(n) time on a rehash. Insertion into some types of balanced BSTs, like red/black trees, AVL trees, or AA trees, is always worst-case O(lg n).
If you're willing to generalize BSTs to more elaborate tree structures, then there are many applications in which a tree can be used to solve problems much more efficiently than in a hash table. Here's a few examples:
kd-trees allow you to store multidimensional data while supporting fast range queries in multidimensional space, as well as efficient nearest-neighbor lookups. You can use them for classification (lazy learning algorithms) or computational geometry.
Link/cut trees can be used to solve max-flow problems much more efficiently than most conventional algorithms would allow. Good push/relabel algorithms use this to speed up their implementations.
Disjoint-set forests can be used to maintain partitions of elements as asymptotically efficiently as possible (amortized α(n) per update, where α(n) is the Ackermann inverse function). They're used in many fast minimum-spanning tree algorithms, as well as some maximum-matching algorithms.
Binary heaps can be used to implement priority queues efficiently. More complex trees can be used to build binomial heaps and Fibonacci heaps, which are of great importance in theoretical computer science.
Decision trees can be used in machine learning for classification, and as a model in theoretical computer science to prove bounds on the runtimes of various algorithms.
Ternary search trees are an alternative to tries that are based on as slightly modified BST. They allow for very fast lookup and insertion of elements and for sparse data sets are quite concise.
B-trees are used by many database systems to efficiently look up elements where disk access is a limiting factor.
Binary space partitioning trees are a generalization of kd-trees that can be used to quickly render computer graphics (they were used to optimize rendering in the original game Doom) and do collision-detection.
BK-trees allow you to quickly determine all words that are within a certain edit distance of some other word, and more generally to find all points in a metric space within a certain distance of some other point.
Fusion trees are an alternative to hash tables for integer keys that have extremely fast support for lookups, insertions, and deletions.
van Emde Boas trees another alternative to hash tables for integer keys that support lookup, insertion, deletion, successor, and predecessor in O(lg lg n) time per element. Some database systems use vEB trees to optimize performance.
I'm not sure how on-topic this answer is, but it should give you a sense for how wonderful and powerful BSTs and more general tree structures can be.
One example of where a binary tree is required is binary space partitions in computer graphics
http://en.wikipedia.org/wiki/Binary_space_partitioning
A binary tree is needed because the algorithm requires the preservation of the relationships between the nodes in the binary tree. There are many other algorithms where the structure of the tree is important, and so a hash table isn't an appropriate structure.
Another good reason for using a binary tree instead of a hash table is when you can't easily generate a efficient hash for your data items, but you can generate a comparison function.
Often for simple storage and retrieval of data a hash table is more optimal, but more complex to implement.
One of the most overlooked is that many file systems use binary trees to manage directory listings. They rarely use a plain binary tree, but some variation such as a B-tree. This is because the question of on-disk storage of the tree is quite important to the details of implementation. The reason they use this sort of structure is for efficiency and speed. This lets them do things like support thousands of files in a directory. Comparisons for file creation and deletion times highlight efficiencies for this aspect of the filesystem.
Binary trees are also used on many games that render 3D objects. Again, the reason is speed. In fact, speed is so important that some game engines, such as the Quake engine actually have the binary tree pre-generated and pre-optimized as part of the map build process.
One thing to note is that Binary Search Tree are space efficient. For example, you have 10 integers to store and you have a hash function that maps from 0 - 99, then u need an array of 100 integers. If you used Binary Search Tree, then you would allocate only as much memory as required by 10 elements
This should probably be a comment but Self-balancing BST(s)(log(n)) are used widely rather than BSTs. Plain BSTs have worst-case O(N) insertion/removal time.
精彩评论