开发者

SELECTing two random rows in an SQL table where the values of a column come very close

开发者 https://www.devze.com 2023-02-15 23:42 出处:网络
I ran into a little problem. I am creating a web app with code snippets. Users are confronted with two random code snippets which are tagged similarly and they must choose which of the two they like t

I ran into a little problem. I am creating a web app with code snippets. Users are confronted with two random code snippets which are tagged similarly and they must choose which of the two they like the most, or they can skip (similar to Facemash, but with code snippets instead of pictures).

I have table called CodeSnippets with these fields:

  • Id — ID of the snippet
  • Score — Score of the snippet
  • Some more non-important columns

Score is a real number.

My question: how can I write a query that gets two random records with scores that are very close? The problem is that I have to order trice: RAND(), Score and RAND() with a margin.

This is what I already have:

SELECT Id, Score, Code FROM CodeSnippets ORDER BY RAND() LIMIT 2

But this simply picks two random records, even if their scores lie far away from eachother :)

Could anyone point开发者_如何学C me in the right direction?


Can you do it with a subquery? First have a self-join query to select all pairs with score within a threshold, then take the first random pair:

SELECT TOP 1 Id1, Id2
FROM (
  SELECT C1.Id AS Id1, C2.Id AS Id2 FROM CodeSnippets C1, CodeSnippet C2
  WHERE C1.Id <> C2.Id AND ABS(C1.Score-C2.Score) < [Threshold]
)
ORDER BY RAND()

Be careful because not all database engines support sub-queries (although most should). Most engines will also optimize/rewrite sub-queries to run fast, but some embedded database engines may just run the sub-query, create a temp table in memory, then run the outer query over the temp table, resulting in poor performance. If that is your case, you'll need to rewrite this query using joins.

0

精彩评论

暂无评论...
验证码 换一张
取 消