开发者

How to handle large (1m+ rows/daily) mySQL databases & transactions

开发者 https://www.devze.com 2023-01-24 05:39 出处:网络
I h开发者_如何学Goave a web service that handles 500k+ unique hits a day (will be bumped up to 4m). There is a lot of log data per visitor (~5 rows/visit) to log various information about each visit (

I h开发者_如何学Goave a web service that handles 500k+ unique hits a day (will be bumped up to 4m). There is a lot of log data per visitor (~5 rows/visit) to log various information about each visit (useragent, IP, location, etc). Every day at 1am I have PHP & mySQL summarize all the data in the log tables (# uniques, us uniques, average time) into another summary table. Each visitor is associated with 1 of about 1k different "groups" when they visit the site, depending upon certain characteristics (user agent, OS, location) Summarizing all the data takes a really long time and sometimes kills the DB server as we run the summary query for each of the 1k groups, and then insert the data into the summary table. Is there a more efficient way of storing and summarizing extremely large amounts of log data in a mySQL db?


if you're handling extremely large sets of data, maybe you could take a look at none relational databases.

It's different in thinking, takes a little to learn, but in the end it turns out to be quite the performance boost your website needs when handling alot of traffic.

Here's some more information about Cassandra, start there, and if it sounds interesting maybe take a look at other NoSQL solutions

http://en.wikipedia.org/wiki/Apache_Cassandra


There's no general answer to problems this size, but have you considered, instead of summarizing every day, don't store the log data but update the relevant records immediately?

You could also put triggers on the log table. Although keeping your data updated might be a bit heavier, at least you spread it out over the course of the day.

If you considered all the usual options, sharding might be an option. Divide your data in smaller groups, either through partitioning or physical mysql instances.


One option is to make the log table 'dumb'. No indexes at all, and with no transactions needed, it can be MyISAM for simplicity. That'll allow for fast inserts. For your analysis, you do one bulk transfer from the log table to another table, which does have indexes (and preferably be on another server machine entirely), and run the analysis there.


run clones, and run your nightly reports on one of the clones, while the other is serving your site visitors.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号