Sharding GridFS on MongoDB_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-18 00:56 出处：网络

I\'m documentin开发者_如何学Pythong about the GridFS and the possibility to shard it among different machines.

I'm documentin开发者_如何学Pythong about the GridFS and the possibility to shard it among different machines.

Reading the documentation here, the suggested shard key is chunks.files_id. This key will be linked to the _id of the files collection, thus this _id is incremental. Every new file i save in the Grid will have a new incremental _id.

In the O'Reilly "Scaling MongoDB" book the use of an incremental shard key is discouraged to avoid HotSpots (the last shard will receive all the write and read).

what is your suggestion for sharding the GridFS collection?

have anybody experienced the HotSpot problem?

thank you.

You should shard on files_id to keep file chunks together, but you are correct that that will create a hotspot. If you can, use something other than ObjectId for _ids in the fs.files collection (probably MD5s would be better than ObjectIds).

We'll be adding hashing for sharding, which will solve this, but not until at least 2.0.

You can shard gridfs data because gridfs it just two collecttions: chunks and files. And gridfs sharding it's very useful and great thing. About gridfs shard key it's always bad choose random or incremental shard key, because data not evenly distribute across shards. In case of incremental shard key all writes going to the last shard and it growth and once difference between become 10 or more chunks, balancer move data to another shards. Moving data to another shard always difficult task that should be avoided as it possible.
So when you choose shard key you should care about even distribution of data.
Also if you get luck mb author of 'Scaling MongoDB' kristina(great specialist in shard keys) will answer to your question.
Documentation says that in common cases you should choose default index fileId:1,n:1 as shard key:

There are different ways that GridFS can be sharded, depending on the need. One common way to shard, based on pre-existing indexes, is:

"files" collection is not sharded. All file records will live in 1 shard. It is highly recommended to make that shard very resilient (at least 3 node replica set) "chunks" collection gets sharded using the existing index "files_id: 1, n: 1". Some files at the end of ranges may have their chunks split across shards, but most files will be fully contained within the same shard.

Currently MongoDB as of version 1.8.1 supports only sharding on "file_id" field, because of using md5 to verify the upload, but it doesn't work across shards yet. So you cannot split single file across shards. Answer on google group7