开发者

Most efficient way to calculate 'popularity' of objects on website

开发者 https://www.devze.com 2023-02-13 11:06 出处:网络
Ok so I\'m building a site where people can post news, comments, questions, etc. People can also rate all of these objects, favorite most of them, share them, etc. The site is PHP+MySQL. I wrote a scr

Ok so I'm building a site where people can post news, comments, questions, etc. People can also rate all of these objects, favorite most of them, share them, etc. The site is PHP+MySQL. I wrote a script in PHP that does the following:

  1. Grab all comments and the scores added to them in the past 5 minutes. Add a record to the 'popularity' table with the change in popularity for each comment object.
  2. Grab all news and scores/views/favorites/shares added to them. Calculate popularity for each news story (taking into account the change in popularity of the comments attached to them from step 1) and insert a record into the popularity table with the change in popularity for each news object.
  3. Repeat step 2 for questions and the other object types

I开发者_StackOverflow中文版 tried to run this script (it's actually a symfony task) every 5 minutes with a cron job and PHP started choking and eating all of my server resources.

What is the preferred way to run a background analytics script that calculates new data based on data in a MySQL DB, then inserts the calculated data into the DB? I'm sure I'm missing some basic procedures here. I should note that the DB is on a different server and that server had no resource problems. The problem seems to be confined to PHP choking on the application server looping through the objects, calculating popularity (simple calculations), and inserting into DB.

Thanks

-- Edit

How about replicating the DB to a server used just for calculations. I could run the popularity script on the calculation server with the replicated DB and insert calculated popularity records into the live DB. It would of course be slightly delayed but that's not a huge deal. I'm not sure if this would fix the PHP resource consumption issue though.


Well the first thing to do would be to try to reduce number of queries you execute. This is especially important if your sql and web servers are on different machines. Try to use JOINs to calculate popularity of news items without getting through all the comments individually.

Well you can calculate popularity of the comments AND popularity of the new items in a same query. (eg select sum(rating) FROM news, comments, rating WHERE comments.news_id = news.id AND rating.comment_id = comments.id (this query is oversimplified but still...)) Cuz your main problem is the amount of queries you have to execute and certainly there will be enough resources on your mysql server. Because most of the time sql server will just wait for next query to arrive. Communications across network are gazillion times slower then between cpu and ram. Basically what happens is: php sends a query to mysql server and waits for response. Mysql gets a query processes, sends response and waits for the next query. This waiting is what taking time... So either reduce amount of queries or send all the quires in the same time using mysqli http://php.net/manual/en/mysqli.multi-query.php


If DB is on a diffrent server I would rather opt for writing MySQL procedure for calculating, or using persistent connection at least. Anyway 5 min is very often especially for busy servers. Usually such task should be performed once to few a day in my opinion.


Rather than run this as a cron job, you could just update the popularity each time an action that would alter it is performed. So for example when a user adds a comment or rates an item, once that is done you then update the popularity of the item.

0

精彩评论

暂无评论...
验证码 换一张
取 消