开发者

Mixing different categories results, ordered by score in MySQL

开发者 https://www.devze.com 2023-03-10 18:57 出处:网络
In my PHP application, I have a mysql table of articles which has the following columns: article_idarticletextcategory_idscore

In my PHP application, I have a mysql table of articles which has the following columns:

article_id    articletext    category_id    score

Each article has a score which is calculated based on how popular it is, and belongs to a specific category (there are around 10 categories available)

My question is: how can I perform a query that returns the highest scored articles while alternating them by categories so that if possible, no same-category articles are returned consecutively. For example if the highest scored article has score: 100 开发者_开发问答the returning set would be something like:

article_id    articletext    category_id    score
-----------------------------------------------------
142           <.....>        5              100
153           <.....>        3              97
119           <.....>        5              99
169           <.....>        2              93
121           <.....>        7              89
197           <.....>        2              92
.
.
.

The first (naive) solution that comes in mind is performing 10 selects (1 for each category), ordering them by score descending, and then, in PHP level, alternating between each returned dataset, picking one result at a time and combining them together in a new array.

Is there any more efficient way to achieve this? If possible at the MySQL level


Go get the top 20. If they don't satisfy the requirements, do an additional query to get the missing pieces. You should be able to come up with some balance between number of queries and number of rows each returns.

I you got the top 100 it might satisfy the requirements 90% of the time and would be cheaper and faster than 10 separate queries.

If it was SQL Server I could help more...

Actually, I have another idea. Run a process every 5 minutes that calculates the list and caches it in a table. Make DML against related tables invalidate the cache so it is not used until repopulated (perhaps an article was deleted). If the cache is invalid, you would fall back to calculating it on the fly... And could use that to repopulate the cache anyway.

It might be possible to strategically update the cached list rather than recalculate it. But that could be a real challenge.

This should help both with query speed and reducing load on your database. It shouldn't matter much if your article list is 5 minutes out of date. Heck, even 1 minute might work.


Just for learning purpose. I made a test with 3 categories. I have no idea how this query could run on a large recordset.

select * from (
(select @r:=@r+1 as rownum,article_id,articletext,category_id,score
from articles,(select @r:=0) as r
where category_id = 1
order by score desc limit 100000000) 
union all
(select @r1:=@r1+1,article_id,articletext,category_id,score
from articles,(select @r1:=0) as r
where category_id = 2
order by score desc limit 100000000)
union all
(select @r2:=@r2+1,article_id,articletext,category_id,score
from articles,(select @r2:=0) as r
where category_id = 3
order by score desc limit 100000000)
) as t
order by rownum,score desc


Your naive solution is exactly what I would do.

0

精彩评论

暂无评论...
验证码 换一张
取 消