I'm getting duplicate _ids when inserting documents into our mongo database. This is an intermittent problem that only happens under some load (is reproducable with some test scripts).
Here's some test code so you don't think I'm trying to double-insert the same object (I know that the PHP mongo driver adds the _id field):
// Insert a job
$job = array(
'type' => 'cleanup',
'meta' => 'cleaning the data',
'user_id' => new MongoId($user_id),
'created' => time(),
'status' => 'pending'
);
$this->db->job->insert($job, array('safe' => true)); // <-- failz here
I went on a frenzy and installed the latest stable (1.1.4) mongo driver to no avail. This isn't under heavy load. We're doing maybe 5 req/s on one serve开发者_运维百科r, so the 16M rec/s limit for the inc value probably isn't the issue.
Any ideas would be greatly appreciated. I'm hoping someone somewhere has used mongo with PHP and inserted more than 5 docs/s and had this issue ;).
-EDIT-
On CentOS 5.4 x86_64, linux 2.6.18-164.el5xen, Apache worker 2.2.15, PHP 5.2.13, MongoDB 1.8.1-EDIT2-
As noted in the comments, I'm using the latest version of the PECL driver as of now (1.2.0) and the problem is still happening.-EDIT3-
Forgot to post exact error:Uncaught exception 'MongoCursorException' with message 'E11000 duplicate key error index: hannibal.job.$_id_ dup key
There is a different solution for this (the preform/worker MPM didn't help in my case, we were running as prefork which is default anyway).
The issue is that the insert array is passed by reference, and modified by the PHP MongoDB library to include the ID. You need to clear the ID.
So imagine the following code:
$aToInsert = array('field'=>$val1);
$collection->insert($aToInsert); << This will have '_id' added
$aToInsert['field'] = $val2
$collection->insert($aToInsert); << This will fail with the above error
Why? What happens with the library is:
$aToInsert = array('field'=>$val1);
$collection->insert($aToInsert);
// $aToInsert has '_id' added by PHP MongoDB library
// Therefore $aToInsert = array('field'=>$val1, '_id'=>MongoID() );
$aToInsert['field'] = $val2
// Therefore $aToInsert = array('field'=>$val2, '_id'=>MongoID() );
$collection->insert($aToInsert);
// This will not add '_id' as it already exists. But will now fail.
Solution is to reinitialise the array
$aToInsert = array('field'=>$val1);
$collection->insert($aToInsert);
$aToInsert = array('field'=>$val2);
$collection->insert($aToInsert);
or to unset the id
$aToInsert = array('field'=>$val1);
$collection->insert($aToInsert);
unset($aToInsert['_id']);
$aToInsert['field'] = $val2
$collection->insert($aToInsert); << This will now work
Looks like it had to do with the Apache version installed (worker). After installing apache prefork, we've seen no more duplicate _id errors on the server.
My guess is this has something to do with the global counter the Mongo driver uses. I'm thinking the lack of communication between the threads may be the cause...maybe one pool has instance counters per-thread, but since the PID is the same, you get conflicts.
I don't know the internals, but this seems to be the most likely explanation. Don't use Apache Worker MPM with the PHP MongoDB driver. Please comment and correct me if this is not the case, or if you know of a fix.
精彩评论