I am trying to build a web log analyzer based on the tomcat log.
And I first push the log to the database, then do some Statistics.
Now I meet a problem:
For a given data range for example (2010-09-20 to 2010-09-25), I have to calculate the visitors of each day,so I first split the data range day by day:
Split:(2010-09-20,2010-09-25) to (2010-0开发者_Python百科9-20 00:00:00, 2010-09-21 00:00:00), (2010-09-21 00:00:00, 2010-09-22 00:00:00), (2010-09-22 00:00:00, 2010-09-23 00:00:00), (2010-09-23 00:00:00, 2010-09-24 00:00:00), (2010-09-24 00:00:00, 2010-09-25 00:00:00),
Then I use the sql to query the num of the visitors of each range.
For example:
select count(distinct ip) from log where time between 201009200000 and 201009210000 group by ip.
This sql is used to calculate the visitors of 2010-09-20.
So if the data range from the user request covers more than one day, I should connect the database more than one time. is it low efficiency?
Any solution?
BWT, I use MySQL.
You start by making a table (days) with all days in a year.
E.g. select count(distinct ip),day.starttime from log inner join days on log.time between day.starttime and day.endtime where log.time between 201009200000 and 201009210000 group by ip,day.starttime
or something like that
精彩评论