What Mongo Index algorithm is using? Binary Tree?_问答_开发者

What Mongo Index algorithm is using? Binary Tree?

开发者 https://www.devze.com 2023-02-22 01:43 出处：网络

I would like to know what kind of internal indexing algorithm MongoDB is using. Because I have some data want to store, and each document (row) has a id, which is probably a unique hash value. (e.g. generated by md5() or other hash algorithm). so, I would to understand which hash me开发者_运维技巧thod I should use to create the id, so that it is fast for the MongoDB to index it. :)

Yes, mongoDB use b-tree, documentation:

An index is a data structure that collects information about the values of the specified fields in the documents of a collection. This data structure is used by Mongo's query optimizer to quickly sort through and order the documents in a collection. Formally speaking, these indexes are implemented as "B-Tree" indexes.

I suggest to use mongodb ObjectId for collection _id, and don't care about: "How to create _id?" at all. Because it probably task for mongodb, but not for developer. I suppose that better to care about schema, indexes, etc..

For Mongo 3.2+, the default storage engine is WiredTiger, and B+ tree is used to store data.

WiredTiger maintains a table's data in memory using a data structure called a B-Tree ( B+ Tree to be specific), referring to the nodes of a B-Tree as pages. Internal pages carry only keys. The leaf pages store both keys and values.

And also LSM Tree is used to store data

WiredTiger supports Log-Structured Merge Trees, where updates are buffered in small files that fit in cache for fast random updates, then automatically merged into larger files in the background so that read latency approaches that of traditional Btree files. LSM trees automatically create Bloom filters to avoid unnecessary reads from files that cannot containing matching keys.

The difference between them

If you don't require extreme write throughput btree is likely to be a better choice. Read throughput is better and high volumes of writes can be maintained.
If you have a workload that requires a high write throughput LSM is the best choice.