I was looking for changing the Database Type of Java Edition 4.1.7 of BDB from BTree to Hash. The Core version had DatabaseType.HASH, DatabaseType.RECNO and DatabaseType.Queue- Are these not supported in the Java 开发者_开发知识库Edition. If so is there a reason for dropping these?
David Segleau, Director of Product Management for Berkeley DB here. Generally, we recommend that people ask questions on the Berkeley DB forums. You'll find a large community of active Berkeley DB application developers there.
Yes, Berkeley DB (the original product in C) has B-Tree, Hash, Queue and Recno access methods. Berkeley DB Java Edition only supports B-Tree. The main reason for this is that about 99% of our users use B-Tree for storage and Hash is only used by a small subset of applications.
Some useful technical tidbits around this topic:
- Hash is particularly useful for people who have a huge data set and a very small amount of available memory cache. In this particular scenario, a B-Tree might require multiple I/Os in order to fetch the internal index pages (that don't fit in cache) and then fetch the record. Hash can typically access the data record with a single I/O.
- Hash is usually not helpful if you want to sequentially access of your data or allow duplicates, since there is no implied ordering in a Hash index.
- Most applications have sufficient available memory cache to to hold the internal nodes of a B-tree as well as the most frequently accessed data records. In this much more common scenario, B-tree and Hash will have almost identical performance.
- Over the last year the Berkeley DB Java Edition team has been working very closely with customers and application developers using very large data sets (in the 250GB - low TB range). In particular, they have been focusing on how to maximize cache efficiency, improve the cache eviction algorithms and minimize the impact of Java garbage collection. We've found that BDB JE 4.1 performs much better, in terms of cache management and efficiency, especially for data sets that exceed the available cache. For more information on this change, see the BDB JE 4.1.7 changelog on the Berkeley DB download page.
- For more information on Hash vs B-Tree access methods in Berkeley DB, see chapter 2 of the BDB Reference Manual (Selecting an Access Method).
I was also trying to understand the same thing. I would too appreciate the possibility of using Hash in berkeley db je as I'm working in the (1) scenario, so with a particular ratio between memory size and dataset size.
Are there any options on this? are you planning to put this back in the future? berkeley db je's site on oracle.com says that access time is constant independently from the dataset size. If you use BTrees, this claim is wrong.
精彩评论