I'm working o开发者_如何学Pythonn a system that needs to keep a log of every view of a page in a MySQL table. The view will only be logged if the visitor hasn't been to that page before in the last 24 hours. I'm wondering if doing this would be much of a problem in terms of performance and database size.
The site that needs to do this averages about 60,000 unique pageviews a day, so that's roughly 60,000 new rows added per day (just under 1 every 2 seconds). The table is only 3 columns: i_id
, ip_address
, timestamp
. i_id
is a foreign key to another table.
The table will be cleared out at the end of every day using a CRON script.
Will there be any immediate database strain by doing this? For example if the site gets a spike in traffic (it does quite regularly) it could shoot up to over 200,000 pageviews in a day, which means over 2 queries per second.
General convention is to not have constraints (primary, foreign, etc) on an audit table, and certainly not indexes -- all of the above will slow insertion.
Bulk insertion would be work considering -- batch the inserts to lower the number of connections needed to the database, the amount of time involved with the operations (one vs numerous). Additionally, if transaction logs are written for this -- minimize writing to them because the database can be impacted by needing to write to IO if you want to be able to resurrect the database at a point in time.
I don't see the point of clearing out the records at the end of the day -- what about traffic that occurs across two days? MySQL partitioning would likely be a better idea.
Your problem is not pageviews per day. You have to think how many pageviews will you have to serve per second on peak hours. If the pageviews are homogeneously spread, and you have only 2 pageviews per second, in an average non-shared server it won't be a problem.
But it's impossible to determine without more data, like which hardware are you using, the real page load distribution, etc.
A few comments:
- Make sure this is an InnoDB table. MyISAM locks the entire table for every insert or update, while InnoDB uses row-level locking.
- Use the smallest numeric data types appropriate for each column.
- Two queries per second? MySQL eats two queries per second BEFORE breakfast. Seriously, you should be able to withstand hundreds.
- If you're still worried, make sure you're using MySQL 5.1 or later, as it allows for much better concurrency in InnoDB tables.
- The 'foreign' key should be, I hope, enforced only via code and convention and not as a strict constraint, as this will slow down inserts.
Just make sure that your table will have proper index for selection. Database management systems are made to withstand much more than that.
I think you should:
- Remove foreign keys. It seems in this case the ones is redundant. When you use FK on each INSERT/UPDATE/DELETE db spends additional resources for checking table data integrity. For logging it's not necessary. We need performance and quick response
- Use myisam. MyIsam engine is simpler and this one doesn't spend resources on different additional things like transaction logging, journaling and so on like it's done in Innodb.
- Use INSERT DELAYED for inserting and flushing indexes not for one row but for batch. See more info http://dev.mysql.com/doc/refman/5.5/en/insert-delayed.html. On each insert query db perform some operations and one of them is flushing index. If you run 20 queries 20 flushes will be made. INSERT DELAYED picks queries in batch and run they like one query. So you get only one flushing.
You will probably want to make sure your mysql clusters is optimized and there could be strains just make sure it would be ready for a hit like that.
精彩评论