I was wondering how to best implement a "most viewed" featured (like youtube) o开发者_开发知识库n my database.
Let me explain the "most viewed" feature a little bit better: Basically I want to list the most visited pages/video/etc from this day/week/month, see http://www.youtube.com/charts/videos_views for an example.
So I was wondering how to best implement this feature as I can think of many many ways of doing it but all of them have their + and - to them.
Plus I also would love to hear the comments of various programmers on others programmers ideas. Also i would like to start a good conversation on this topic.
Ps. I'm looking specially on how to calculate the time, say most viewed in this month, without having a huge table saving every single view with the datetime. Any idea is welcome.
Pps. I use Mysql and PHP, extra tips for those two are very welcome.have the following tables : 1. views 2. views_hourly_summary 3. views_daily_summary 4. views_monthly_summary 5. views_alltime_summary
have a cron job run at the following intervals:
run every hour and preaggregate the views for that hour from the views table and save the preaggregated result in the views_hourly_summary table, also update the views_alltime_summary table
run at the end of everyday and preaggregate the views for that day from the hours table and save the preaggregated result in the views_daily_summary table
run at the end of everymonth and preaggregate the views for that day from the hours table and save the preaggregated result in the views_daily_summary table
next when fetching results you will have to do some math as follows:
For example, you want to fetch the views for last 4 hours, you would fetch the 3 whole hours data from the hourly table and for the remaining data fetch it from the views table as follows:
select item_id, sum(views) as views from views_hourly_summary where hour between concat(left(now() - interval 3 hour, 14), '00:00') and concat(left(now(), 14), '00:00') group by item_id
union
select item_id, count(1) as views from views where datetime between (now() - interval 4 hour) and concat(left(now() - interval 3 hour, 14), '00:00') or datetime > concat(left(now(), 14), '00:00') group by item_id
First I'd try to track users by using a cookie, giving them a unique ID as a visit and for that visit. That should help with identification.
I'd try to move the logic to update the most viewed item(s) out of the website codebase. So this just means each web request posts the visit info to a message queue - probably including the time/date, item being viewed, the tracked info from the cookie and maybe IP address / request header.
I'd then set up a service to read from that queue and process the information. That would allow you to upgrade / modify that logic without affecting the site. You'll probably want to handle multiple refreshes, etc to stop people messing with the results - either delibrately or by accident. Also by abstracting this processing out of the main site you're not slowing down your page request time by - you can make the logic to determine / update item views as complex as you like then.
Simple approach that comes to my mind increment the ViewCount field value on your table by 1 for each unique IP of the users which loaded that page.
Off the top of my head, I would have a ItemViews table, that maps the number of views to a an item ID (assuming that there is only one item type). The table could have 2 columns: ItemId and ViewCount. When a new feature receives its first view, I'd insert a new row in that table and initialize ViewCount to 0. Then I'd increment the counter in that row every time I detect a new view.
I could then compute stats from that table. If there is a concept of categories (like in Youtube), I could join the category IDs with the ItemId and group my view counts that way.
Here is a very simple solution. It's not completely optimal, but could be improved to be better without too much grief.
Basically just divide time up into intervals, say 5 minutes each. Have a record in db with one column for videoID, and one column identifying a specific time interval, and then one column for how many views the video got during that time interval. Then if you want to know how many views a video got in the last day, just sum all those up where the interval/timestamp is greater than a day ago. It's not completely optimal because the last interval will only be partially filled with views, which gives you the views for a little bit of extra time or a little bit less time depending on when you calculate it. But it's good enough for a solution that takes two seconds to make. To calculate the yearly views, do the same thing except with a completely separate table where the time intervals are say 1 day long, so that you don't have to store data and sum up over 100,000 five minute intervals to make a full year.
Just make sure to have a compound index on [videoId, timestamp] so that you can sum up the views quickly.
精彩评论