开发者

Select items that are the top N results for a related table

开发者 https://www.devze.com 2023-01-05 05:43 出处:网络
Say I have a game where a question is asked, people post responses which are scored, and the top 10 responses win. I have a SQL database that stores all of this information, so I migh开发者_运维技巧t

Say I have a game where a question is asked, people post responses which are scored, and the top 10 responses win. I have a SQL database that stores all of this information, so I migh开发者_运维技巧t have tables such as Users, Questions, and Responses. The Responses table has foreign_keys user_id and question_id, and attribute total_score.

Obviously, for a particular Question I can retrieve the top 10 Responses with an order and limit:

SELECT * FROM Responses WHERE question_id=? ORDER BY total_score DESC LIMIT 10;

What I'm looking for is a way I can determine, for a particular User, a list of all their Responses that are winners (in the top 10 for their particular Question). It is simple programmatically to step through each Response and see if it is included in the top 10 for its Question, but I would like to optimize this so I am not doing N+1 queries where N is the number of Responses the User has submitted.


If you use Oracle, Microsoft SQL Server, DB2, or PostgreSQL, these databases support windowing functions. Join the user's responses to other responses to the same question. Then partition by question and order by score descending. Use the row number within each partition to restrict the set to those in the top 10. Also pass along the user_id of the given user so you can pick them out of the top 10, since you're only interested in the given user's responses.

SELECT *
FROM (
  SELECT r1.user_id AS given_user, r2.*,
    ROW_NUMBER() OVER (PARTITION BY r2.question_id ORDER BY r2.total_score DESC) AS rownum
  FROM Responses r1 JOIN Responses r2 ON r1.question_id = r2.question_id
  WHERE r1.user_id = ?
) t
WHERE rownum <= 10 AND user_id = given_user;

However, if you use MySQL or SQLite or other databases that don't support windowing functions, you can use this different solution:

Query for the user's responses, and use a join to match other responses to the respective questions with greater score (or earlier PK in the case of ties). Group by question, and count the number of responses that have higher score. If the count is fewer than 10, then the user's response is among the top 10 per question.

SELECT r1.*
FROM Responses r1
LEFT OUTER JOIN Responses r2 ON r1.question_id = r2.question_id 
  AND (r1.total_score < r2.total_score 
    OR r1.total_score = r2.total_score AND r1.response_id > r2.response_id)
WHERE r1.user_id = ?
GROUP BY r1.question_id
HAVING COUNT(*) < 10;


Try an embedded select statement. I don't have access to a DB tool today so I can't confirm the syntax/output. Just make the appropriate changes to capture all the columns you need. You can also add questions to the main query and join off of responses.

select *
  from users
     , responses
 where users.user_id=responses.user_id
   and responses.response_id in (SELECT z.response_id 
                                   FROM Responses z 
                                  WHERE z.user_id = users.user_id 
                                 ORDER BY total_score DESC 
                                 LIMIT 10)


Or you can really optimize it by adding another field like "IsTopPost". You would have to update the top posts when someone votes, but your query would be simple:

SELECT * FROM Responses WHERE user_id=? and IsTopPost = 1


I think something like this should do the trick:

SELECT 
    user_id, question_id, response_id 
FROM 
    Responses AS r1 
WHERE 
    user_id = ?
AND
    response_id IN (SELECT response_id 
                    FROM Responses AS r2 
                    WHERE r2.question_id = r1.question_id 
                    ORDER BY total_score DESC LIMIT 10)

Effectively, for each question_id, a subquery is performed which determines the top 10 responses for that question_id.

You may want to consider adding a column which marks certain Responses as 'winners'. That way, you can simply select those rows and save the database from having to calculate the top 10's over and over again.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号