开发者

When does MySQL ORDER BY RAND() function order?

开发者 https://www.devze.com 2023-03-24 07:05 出处:网络
I\'ve read about the ORDER BY RAND() and its performance problems -- do these only apply to queries that return large datasets?For example, if I have a table with 100,000 rows and return a dataset wit

I've read about the ORDER BY RAND() and its performance problems -- do these only apply to queries that return large datasets? For example, if I have a table with 100,000 rows and return a dataset with 10 records using a WHERE clause and then us开发者_如何学Goe ORDER BY RAND() LIMIT 1, will this ORDER BY RAND() be applied AFTER my table has been filtered down to records matching the WHERE clause, and thus have negligible performance issues?


You're right, it will apply the ORDER BY after reducing the number of rows with WHERE, GROUP BY, and HAVING. But it will apply ORDER BY before LIMIT.

So if you filter the number of rows down sufficiently, then yes, the ORDER BY RAND() may achieve what you want without a great performance impact. There's a legitimate benefit to code that is simple and easily readable.

The trouble comes when you think your query should reduce the rows to something small, but over time as your data grows, the number of rows it needs to sort becomes large again. Since your query then does LIMIT 10 on the sorted result hides the fact that you're performing ORDER BY RAND() on 500k rows. You just see performance mysteriously getting worse.

I have written about alternative methods for picking a random row in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming, or in other answers here on Stack Overflow:

  • Selecting random rows with MySQL
  • randomizing large dataset
  • quick selection of a random row from a large table in mysql


It doesn't matter how many rows you select. If you ORDER BY RAND() a random number is calculated for every single row in the table. This is because it must calculate the random value for every row in order to know which row generated the largest value. So if you have a table with 100,000 rows and then call ORDER BY RAND() LIMIT 1 You are telling MySQL to generate a random number for 100,000 rows, sort them by that number, and then give you the first one.

It is much much faster to:

  1. SELECT COUNT(*) FROM Table

  2. Generate random number between 0 and result of above query minus 1 in your scripting/programming language.

  3. SELECT * FROM Table LIMIT random_number_here,1


Based on a quick test, I have to conclude that ORDER BY RAND() is applied only after the WHERE statement is applied, and not to the whole dataset.

Results from a table with 50,000 rows:

SELECT * FROM `mytable` LIMIT 1  (1 total, Query took 0.0007 sec)
SELECT * FROM `mytable` WHERE First = 'Hilda' LIMIT 1 (1 total, Query took 0.0010 sec)
SELECT * FROM `mytable` WHERE First = 'Hilda' (142 total, Query took 0.0201 sec)
SELECT * FROM `mytable` WHERE First = 'Hilda' ORDER BY RAND() LIMIT 1 (1 total, Query took 0.0229 sec)
SELECT * FROM `mytable` WHERE First = 'Hilda' ORDER BY RAND() (142 total, Query took 0.0236 sec)
SELECT * FROM `mytable` ORDER BY RAND() LIMIT 1 (1 total, Query took 0.4224 sec)


The RAND() value will be calculated for each row, so it's not very efficient for large data sets, the LIMIT clause doesn't change that. The usual way to work around this is to compute a random number in advance and then retrieve the row corresponding to it based on some pregenerated indexed column.

Here's one detailed explanation:

http://jan.kneschke.de/projects/mysql/order-by-rand/


ORDER BY is about the last to execute, but LIMIT is the VERY LAST.

Unfortunately, this means that the DB is going to generate random numbers for all qualifying rows, order them, and then apply the limit.

What you could do is to have the table with a surrogate id field, generate a random number, and then use

SELECT x,y,z FROM table WHERE id >= your_rand_number

0

精彩评论

暂无评论...
验证码 换一张
取 消