I am looking for insert IIS 6.0 access log ( 5 servers, and over 400MB daily ) to SQL database. What scares me is the size. There is a lot of information you are duplicating (i.e. site name, url, referrer, browser) and could be 开发者_如何学编程normalized by index and look-up table.
Reason why I am looking for own database instead using other tools is that is 5 servers and I need very custom statistics and reports on each, few or all. Also installing any (specially open source) software is massacre ( need have 125% functionality and take months ).
I wounder what would be the most efficient way to do it? Is someone saw examples or articles about it ?
Whilst I would suggest buying a decent log parsing tool if you insist on going it alone, take a look at Log Parser
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en
to help you do some of the heavy listing, either into SQL or maybe it can get the results you are after directly.
On the one hand, you will reduce disk space for values a lot by using artificial keys for things like server IP address, user agent, and referrer. Some of that space you save will be lost to the index, but the overall disk savings for 400 MB per day, times 5 servers, should still be substantial.
The tradeoff, of course, is the need to use joins to bring that information back together for reporting.
My nitpick is that replacing one column's values with an artificial key to a two-column lookup table shouldn't be called "normalizing". You can do that without identifying any functional dependencies. (I'm not certain you're proposing to do that, but it sounds like it.)
You're looking at about 12 gigs a month in raw data, right? Did you consider approaching it from a data warehousing point of view? (Instead of an OLTP point of view.)
精彩评论