I have looked at different ways to approach this but I would like a method which does not allow people to get around it. Just need a simple, light-weight method to count the number of views off different news articles which are stored in a database:
id | title | body | date | views
1 Stack Overflow 2010-01-01 23
- Session - Could they not just clear browser data and reload page for another view? Any way to stop this?
- Datab开发者_开发技巧ase table of ip addresses - Tons of entries, may hinder performance
- Log file - Same issue as database however I've seen lots of examples
For a performance critical system and for ensuring accuracy, which method should I look into further?
Thanks.
If you're looking to figure out how many unique visitors you have to a given page, then you need to keep information that is unique to each visitor somewhere in your application to reference.
IP addresses are definitely the "safest" way to go, as a user would have to jump through a good many hoops to manually change their IP address. That being said you would have to store a pretty massive amount of data if this is a commercial web-site for each and every page.
What is more reasonable to do is to store the information in a cookie on the client's machine. Sure if your client doesn't allow cookies you will have a skewed number and sure the user can wipe their browser history and you will have a skewed number but overall your number should be relatively accurate.
You could potentially keep this information cached or in session-level variables, but then if your application crashes or restarts you're SOL.
If you REALLY need to have nearly 100% accurate numbers then your best bet is to log the IP addresses of each page's unique visitors. This will ensure you the most accurate count. This is pretty extreme though and if you can take a ~5+% hit in accuracy then I would definitely go for the cookies.
I think that to keep it lightweight you should use someone else's processing power, so for that reason you should sign up to Google Analytics and insert their code into your pages that you want to track.
If you want more accuracy then track each database request in the database itself; or employ a log reading tool that then drops summaries of page reads into a database or file system each day.
Another suggestion:
When the user visits your website log their IP address in a table and drop a cookie with a unique ID. Store this unique ID in a table, along with a reference to the IP address record. This way you are able to figure out a more accurate count (and make adjustments to your final number)
Setup an automated task to create summary tables - making querying the data much faster. This will also allow you to prune the data on a regular basis.
If you're happy to sacrifice better accuracy then this might be a solution:
This would be the "holding" table - which contains the raw data. This is not the table you'd use to query data from - it'd just be for writing to. You'd run through this whole table on a daily/weekly/monthly basis. Yet again - you may need indexes dependant on how you wish to prune this.
CREATE TABLE `article_views` (
`article_id` int(10) unsigned NOT NULL,
`doy` smallint(5) unsigned NOT NULL,
`ip_address` int(10) unsigned NOT NULL
) ENGINE=InnoDB
You'd then have a summary table, which you would update on a daily/weekly or monthly basis which would be super fast to query.
CREATE TABLE `summary_article_uniques_2011` (
`article_id` int(10) unsigned NOT NULL,
`doy` smallint(5) unsigned NOT NULL,
`unique_count` int(10) unsigned NOT NULL,
PRIMARY KEY (`article_id`,`doy`),
KEY(`doy`)
) ENGINE=InnoDB
Example queries:
Unique count for a specific article on a day:
SELECT unique_count FROM summary_article_uniques_2011 WHERE article_id=? AND doy=" . date('z') . "
Counts per day for a specific article:
SELECT unique_count FROM summary_article_uniques_2011 WHERE article_id=?
Counts across the entire site, most popular articles today:
SELECT article_id FROM summary_article_uniques WHERE doy=? ORDER BY unique_count DESC LIMIT 10 // note this query will not hit an index, if you are going to have a lot of articles your best bet is to add another summary table/index "unique_count"
精彩评论