We have a mongo database with around 1M documents, and we want to poll this database using a processed field to find documents which we havent seen before. To do this we are setting a new field called _processed
.
To query for documents which need to be processed, we query for documents which do not have this processed field:
db.stocktwits.find({ "_processed" : { "$exists" : false } })
However, this query takes around 30 sec开发者_运维百科onds to complete each time, which is rather slow. There is an index (asc) which sits on the _processed field:
db.stocktwits.ensureIndex({ "_processed" : -1 },{ "name" : "idx_processed" });
Adding this index does not change query performance. There are a few other indexes sitting on the collection (namely the ID idx & a unique index of a couple of fields in each document).
The _processed field is a long, perhaps this should be changed to a bool to make things quicker?
We have tried using a $where query (i.e. $where : this._processed==null
) to do the same thing as $exists : false
and the performance is about the same (few secs slower which makes sense)...
Any ideas on what would be casusing the slow performance (or is it normal)? Does anyone have any suggestions on how to improve the query speed?
Cheers!
Upgrading to 2.0 is going to do this for you:
From MongoDB.org:
Before v2.0, $exists is not able to use an index. Indexes on other fields are still used.
Its slow because checking for _processed -> not exists
doesnt offer much selectivity. Its like having an index on "Gender" - and since there are only two possible options male
or female
then if you have 1M rows and an index on Gender
it will have to scan 50% or 500K rows to find all males.
You need to make your index more selective.
精彩评论