I have written a feed aggregator before but am trying to optimize it a bit. In the past, using simplepie (php class) to parse the feeds, I have used the get_id() function for each feed item to return a hash (an md5 mix of link + title). I store this "id" as the "remote_id" in MySQL. However to ensure that I have no duplicates I've been doing a SELECT query for each feed item to ensure that the "remote_id" does not exist. This seems inefficient considering I am looking at 1000's of feeds.
Is开发者_运维问答 it most efficient to just turn remote_id into a unique key and then let the database fail to write the new record on each pass? Any other way to engineer this that is better?
Yes, if a key should be unique in mysql, it's generally a good idea to define it as a unique key.
When inserting possible duplicates you may use PDO and try {} catch () {} statements to filter them out, they will throw an exception. You won't have to check beforehand.
I use something like this in a similar situation (pseudocode alert):
$stmnt = $this->dbh->prepare('INSERT QUERY');
try {
$this->dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$this->dbh->beginTransaction();
$stmnt->execute($celss);
$this->dbh->commit();
} catch (Exception $e)
{
$this->dbh->rollback();
$this->invalidRows[] = array($cells,$e->getMessage());
continue;
}
精彩评论