I've been playing around with the samus mongodb driver, particularly the benchmark tests. From the output, it appears the size of the documents can have a drastic effect upon how long operations on those collections take.
Is there some documentation available that recommends what balance to strive for or some more "real" numbers around what document size will do to query times? Is开发者_开发百科 this poor performance more a result of the driver and any serialization overhead? Has anyone else noticed this?
But is it a good benchmark? Don't think so. Read Mongodb performance on Windows .
I think the exception that happens when the index should have been created is still swallowed. FindOne() medium return 363 with and without the "creation" of the index.
I cannot find a link right now, but the format of the database is such that it should not matter if a document is large or small. For access via index, there is certainly no difference, for a table scan, uninteresting documents (or uninteresting parts of documents) can be skipped quickly thanks to the BSON format. If anything, the overhead of the BSON format affects tiny documents more than large ones.
So I would assume that the performance drop you see is largely due to the serialization costs of loading those documents (of course it takes more time to write a large document to disk than a small document, but it should be about the same for multiple small documents of the same aggregate size).
In your benchmark, can you normalize the numbers to be based on the same amount of data (in bytes, not in document count)?
You can turn on profiling with db.setProfilingLevel(2)
and query db.system.profile
for details on the executed queries.
Although this may distort the test results a little, it will give you insight into the query times on the server, eliminating any influence the driver or network may have on the results. If these query times show the same pattern as your test, then the document size does influence query times. If query times are roughly the same regardless of document size, then it's serialization overhead you're looking at.
精彩评论