I have a question about mongoimport. Here is my configuration:
5 physical machines. 5 shards, 3 configs, 5 mongos. 1 shard and mongos per machine, 3 (out of the 5) have a config.
I have a few hundred JSON formatted text files that I'm using mongoimport on. I issue one mongoimport on each mongos (so, 5 at a time), until each file has been imported. I am monitoring 开发者_运维知识库each import's records/second and each machine's cpu/mem usage. There is no significant difference on each machine's cpu/mem.
However, the records/second speed varies from 4k to 16k per mongoimport process. This doesn't seem to be related to allocating new datafiles on a given shard. However, it seems to be more related to the imported file itself (though each file is very similar in schema, the only difference between files is number of records, and the problem I am describing occurs across varying #ofRecord files). For example, if a file starts off importing at 10k rec/sec, it seems to continue at that pace throughout the import process, if it starts at 4k, it will that.
Any thoughts as to why this occurs? How can I fix this?
First question, can you run a mongostat
and an iostat
during the import process on each of the machine?
When you're doing an import you're probably taxing the IO, so we want to see some IO numbers.
Second question, do you own these machines are or they "rented" (VMs, cloud boxes?)
If you're running VMs you may not be getting consistent IO. Answering the first question will let you know if this is related.
精彩评论