I have a replica set that I am trying to upgrade the primary to one with more memory and upgraded disk space. So I raided a couple disks together on the new primary, rsync'd the data开发者_如何学编程 from a secondary and added it to the replica set. After checking out the rs.status(), I noticed that all the secondaries are at about 12 hours behind the primary. So when I try to force the new server to the primary spot it won't work, because it is not up to date.
This seems like a big issue, because in case the primary fails, we are at least 12 hours and some almost 48 hours behind.
The oplogs all overlap and the oplogsize is fairly large. The only thing that I can figure is I am performing a lot of writes/reads on the primary, which could keep the server in lock, not allowing for proper catch up.
Is there a way to possibly force a secondary to catch up to the primary?
There are currently 5 Servers with last 2 are to replace 2 of the other nodes. The node with _id as 6, is to be the one to replace the primary. The node that is the furthest from the primary optime is a little over 48 hours behind.
{
"set" : "gryffindor",
"date" : ISODate("2011-05-12T19:34:57Z"),
"myState" : 2,
"members" : [
{
"_id" : 1,
"name" : "10******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20231,
"optime" : {
"t" : 1305057514000,
"i" : 31
},
"optimeDate" : ISODate("2011-05-10T19:58:34Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 2,
"name" : "10******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20231,
"optime" : {
"t" : 1305056009000,
"i" : 400
},
"optimeDate" : ISODate("2011-05-10T19:33:29Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 3,
"name" : "10******:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 20229,
"optime" : {
"t" : 1305228858000,
"i" : 422
},
"optimeDate" : ISODate("2011-05-12T19:34:18Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 5,
"name" : "10*******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 20231,
"optime" : {
"t" : 1305058009000,
"i" : 226
},
"optimeDate" : ISODate("2011-05-10T20:06:49Z"),
"lastHeartbeat" : ISODate("2011-05-12T19:34:56Z")
},
{
"_id" : 6,
"name" : "10*******:27018",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"optime" : {
"t" : 1305050495000,
"i" : 384
},
"optimeDate" : ISODate("2011-05-10T18:01:35Z"),
"self" : true
}
],
"ok" : 1
}
I'm not sure why the syncing has failed in your case, but one way to brute force a resync is to remove the data files on the replica and restart the mongod. It will initiate a resync. See http://www.mongodb.org/display/DOCS/Halted+Replication. It is likely to take quite some time, dependent on the size of your database.
After looking through everything I saw a single error, which led me back to a mapreduce that was run on the primary, which had this issue: https://jira.mongodb.org/browse/SERVER-2861 . So when replication was attempted it failed to sync because of a faulty/corrupt operation in the oplog.
精彩评论