开发者

Single SQL query that returns list of "most commented" relative to past 7 days, but the list always has to contain something

开发者 https://www.devze.com 2023-03-14 16:40 出处:网络
What first seemed easy has now become a bit of a brain teaser. I\'m building one of those \"Most Commented\" or \"Most Emailed\" lists that you see in the right sidebar of news/blog websites.

What first seemed easy has now become a bit of a brain teaser.

I'm building one of those "Most Commented" or "Most Emailed" lists that you see in the right sidebar of news/blog websites.

The list has to be r开发者_StackOverflowelative to a recent date range (say the last 7 days) because you want the list to always to look fresh and contain newer content. Otherwise, the few articles that got tons of comments will always sit atop the list.

For instance, in my case, the website has a few articles that have several hundred comments each but all the rest have less than 20. So if the "Most Commented" list was pulled like this:

select
  ArticleId,
  count(CommentId) as Comments
from
  Comment
group by
  ArticleId
order by
  count(CommentId) desc
limit 10;

Then the few articles with several hundred comments would always be at the top. The list would never change.

So I rewrote the select to include just the last 7 days:

select
  ArticleId,
  count(CommentId) as Comments
from
  Comment
where
  Created >= '2011-06-14'
group by
  ArticleId
order by
  count(CommentId) desc
limit 10;

That looks better, but it still won't work because it doesn't account for situations in which there is no activity in the last 7 days.

If nothing happened (or only a few comments happened) in the last 7 days, the list should contain activity from before that. Bottom line - the list always needs to show 10 articles.

I could always call multiple SQL statements, going back in 7 days increments until I fill the list, but I don't want to do that. I'm caching the results in in-process memory, but I still want to only make one SQL call if at all possible.

Thoughts? I feel like this is easily doable and I'm just overlooking the obvious somewhere.


I recommend computing a weight for each result and sort by that. For example, your weight could be the number of actions (comments, emails, etc) / age in days. That way, the older the article, the lower its weight unless it has an extremely high action rate.

Something like:

select
  ArticleId,
  count(CommentId) / (1.0 * DATEDIFF(CURRENT_DATE, Created)) AS weight
from
  Comment
group by
  ArticleId
order by
  weight desc
limit 10;

You'll want to play with your weight formula until you get the right mix of newer content with slightly older but active content.


Here is another approach that answers your question as written (my previous answer suggests a different approach).

This one will return the top for this week and more from previous weeks if needed to fill the top 10. It works by computing a column that is the number of weeks old the article is and sorts by that and then the number of comments.

The advantage here is that will will always return 10 results (assuming you have more than 10 articles) and will return them sorted by age and comments with the most recent weeks articles at the top and the articles that are one week old next, etc.

select
  ArticleId,
  count(CommentId) as number_of_comments,
  DATEDIFF(CURRENT_DATE, Created) DIV 7 AS weeks_old
from
  Comment
group by
  ArticleId,
  weeks_old
order by
  weeks_old asc,
  number_of_comments desc
limit 10;


I assume you also want to limit to a number of records. Try something like the following: (NOTE: I use MSSQL, so some details/syntax might be incorrect for mysql...sorry!)

select top 10
  ArticleId,
  case when Created>='2011-06-14' then 1 else 0 end as [isNew],
  count(CommentId) as Comments
from
  Comment
group by
  ArticleId,
  case when Created>='2011-06-14' then 1 else 0 end
order by
  isNew desc,
  Comments desc


Assuming that you want to keep filling it in by 7 day increments:

SELECT
    C.article_id,
    COUNT(C.comment_id)
FROM
    Comments C
INNER JOIN Numbers N ON
    N.number >= 0 AND
    N.number <=4 AND
    C.created <= CURRENT_DATE - INTERVAL (N.number * 7) DAY AND
    C.created >= CURRENT_DATE - INTERVAL ((N.number + 1) * 7) DAY
GROUP BY
    C.article_id
ORDER BY
    N.number
    COUNT(C.comment_id)
LIMIT 10

I don't work with MySQL, so you may need to play with the syntax, especially with the date functions.

Also, this uses a Numbers table, which is just a table of numeric integers from 1 to whatever. Maybe MySQL has a built-in function for this, otherwise you would need to create or generate the table. It only goes back up to four weeks in this case, so you could even simply use a SELECT 1 UNION ALL SELECT 2...

Finally, since it only goes back four weeks, you could still end up with less than 10 articles. You can increase the number of weeks to something that seems reasonable in your business case.


I may be missing something but couldn't you Union what you already have?

I don't know the mySQL syntax so you will have to convert.

select top 10 *
from 
(
select ArticleId, count(CommentId) as Comments 
from  Comment 
where   Created >= '2011-06-14' 
group by   ArticleId 
union
select ArticleId, count(CommentId) as Comments 
from Comment 
group by  ArticleId 
) 
order by comments desc


This will give you the recently commented articles sorted based on the total comments received by an article.

SELECT ArticleId, COUNT(CommentId) CommentsCount FROM Comment WHERE ArticleID IN
(
select DISTINCT ArticleId FROM Comment ORDER BY Created limit 10
) ORDER BY CommentsCount DESC

Hope that is closer to what you want. If you need the accurate query i.e. include only the last weeks comment in the CommentsCount field, please let me know :-)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号