This is a followup to debugging a mysql insert fail in php as I now have time to continue on the project.
I have three tables in this problem:
840,721 posters in poster_data 58,506 poster categories in poster_categories 17,629,007 (17 million+) poster/category combinations in poster_prodcatBased on the Efficient Pagination presentation by Yahoo!, I'm trying to add a category rank number to poster_prodcat so we can page by rank as opposed to using limits and offsets. Despite cranking up my php mysql connect timeout to 3600 (crazy 开发者_如何学JAVAI know) and turning of the php timeout, the ordering never seems to complete. maybe several tens to maybe a hundred thousand or so, but never the full 17,000,000 set.
Here's the script:
$sql1="select distinct apcatnum from poster_prodcat";
$result1 = mysql_query($sql1);
while ($cats = mysql_fetch_array ($result1)) {
$sql2 = "SELECT poster_data.apnumber,poster_data.aptitle
FROM poster_prodcat,poster_data
WHERE poster_prodcat.apcatnum ='$cats[apcatnum]'
AND poster_data.apnumber = poster_prodcat.apnumber
ORDER BY aptitle ASC";
$result2 = mysql_query($sql2);
$ordernum=1;
while ($order = mysql_fetch_array ($result2)) {
$sql3 = "UPDATE poster_prodcat SET catorder='$ordernum'
WHERE apnumber='$order[apnumber]' AND apcatnum='$cats[apcatnum]'";
$result3 = mysql_query($sql3);
$ordernum++;
}
}
This is on a 2 gig server that also hosts the site. The timeouts are long and the server's not crashing, so I don't see what's stopping it from completing. Can I do this on this server, or since this is a once a month or so operation, should I just create some massive memory EC2 instance, do the sorting there and downloaded the massaged tables?
Thanks.
Here's the structure of poster_data (with some fields removed that aren't involved in the selects):
CREATE TABLE `poster_data` (
`apnumber` mediumint( 8 ) NOT NULL DEFAULT '0',
`aptitle` varchar( 255 ) NOT NULL DEFAULT '',
`aptype` varchar( 100 ) NOT NULL DEFAULT '',
[snip]
UNIQUE KEY `posterid` ( `apnumber` ) ,
KEY `aptitle` ( `aptitle` ) ,
KEY `aptype` ( `aptype` ) ,
KEY `title_type` ( `aptitle` , `aptype` ) ) ENGINE = InnoDB;
poster_prodcat:
CREATE TABLE `poster_prodcat` (
`apcatnum` mediumint( 8 ) NOT NULL DEFAULT '0',
`apnumber` mediumint( 8 ) NOT NULL DEFAULT '0',
`catorder` mediumint( 7 ) NOT NULL DEFAULT '0',
PRIMARY KEY ( `apcatnum` , `apnumber` ) ,
KEY `apcatnum` ( `apcatnum` ) ,
KEY `apnumber` ( `apnumber` ) ,
KEY `catorder` ( `catorder` )
) ENGINE = InnoDB /*!50100 PARTITION BY HASH (apcatnum) PARTITIONS 10 */;
while ($order = mysql_fetch_array ($result2)) {
$sql3 = "UPDATE poster_prodcat SET catorder='$ordernum'
WHERE apnumber='$order[apnumber]' AND apcatnum='$cats[apcatnum]'";
$result3 = mysql_query($sql3);
$ordernum++;
}
}
Does this mean you're executing 17 million separate transactions? If you can sustain a thousand transactions per second, this part alone will take about 5 hours, right?
Wikipedia (readily available, but not what I'd consider authoritative) says this about innodb transactions.
When operating in fully ACID-compliant modes, InnoDB must do a flush to disk at least once per transaction, though it will combine flushes for inserts from multiple connections. For typical rotating hard drives or arrays, this will impose a limit of about 200 update transactions per second.
精彩评论