Currently I am using mysql to log all traffic from all users coming into a website that I manage. The database has grown to al开发者_JAVA百科most 11m rows in a month, and queries are getting quite slow. Is there a more efficient way to log user information? All we are storing is their request, useragent, and their ip, and associating it with a certain website.
Why not try Google Analytics? Even if you might not think it would be sufficient for you, I bet you it can track 99% of what you want to be tracked.
The answer depends completely on what you expect to retrieve in the query side. Are you looking for aggregate information, are you looking for all of history or only a portion? Often, if you need to look at every row to find out what you need, storing in basic text files is quickest.
What are the kind of queries that you want to run on the data? I assume most of your queries are over data in current or recent time window. I would suggest to use time based partitioning of the table. This will make such queries faster as the queries will hit only the partition having the data, so less disk seeks. Also regularly purge old data and put them in summary tables. Some useful links are:
- http://forge.mysql.com/w/images/a/a2/FOSDEM_2009-Giuseppe_Maxia-Partitions_Performance.pdf
- http://www.slideshare.net/bluesmoon/scaling-mysql-writes-through-partitioning-3397422
the most efficient way is probably to have apache (assuming thats what the site is running on) simply use its built in logging to text logs, and configure something like AWStats. This removes the need to log this information yourself, and should provide you with the information you are looking for - probably all ready configured in existing reports. The benefit of this over something like Google Analytics would be its server side tracking - etc.
Maybe stating the obvious but have you got a good index in relation to the querys that you are making?
1) Look at using Piwik to perform Google Analytic type tracking, while retaining control of the MySQL data.
2) If you must continue to use your own system, look at using InnoDB Plugin in order to support compressed table types. In addition, convert IP to unsigned integer, convert both useragent and request to unsigned int referencing lookup tables that are compressed using either Innodb compression or the archive engine.
3) Skip partitioning and shard the DB by month.
This is what "Data Warehousing" is for. Consider buying a good book on warehousing.
Collect the raw data in some "current activity" schema.
Periodically, move it into a "warehouse" (or "datamart") star schema that's (a) separate from the current activity schema and (b) optimized for count/sum/group-by queries.
Move, BTW, means insert into warehouse schema and delete from current activity schema.
Separate your ongoing transactional processing from your query/analytical processing.
精彩评论