开发者

Returning random rows from mysql database without using rand()

开发者 https://www.devze.com 2023-01-10 13:56 出处:网络
I would like to be able to pull back 15 or so records from a database. I\'ve seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I\'ve seen are gea

I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.

Does anyone know of an efficient way to do this for large databases?

edit:

Further Edit and Testing:

I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:

Query 1: select * from test order by rand() limit 15

 Query 2: select * 
          from 
      test 
          join 
      (select round(rand()*(select max(autokey) from test)) as val from test limit 15)                                           as rnd
      on 
          rnd.val=test.autokey;`

(I tried both select and select distinct and it made 开发者_运维技巧no discernible difference)

and:

Query 3 (I only ran this on the second test):
SELECT  *
    FROM    (
    SELECT  @cnt := COUNT(*) + 1,
            @lim := 10
    FROM    test
    ) vars
    STRAIGHT_JOIN
    (
    SELECT  r.*,
            @lim := @lim - 1
    FROM    test r
    WHERE   (@cnt := @cnt - 1)
            AND RAND(20090301) < @lim / @cnt
    ) i
ROWS:            QUERY 1:               QUERY 2:         QUERY 3:
2,060,922          2.977s                 0.002s            N/A

3,043,406          5.334s                 0.001s            1.260     

I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.

Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?


Try:

select * from table order by rand() limit 15

Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)

select * from table join 
   (select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
   on rand.val=table.id;

where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.

UPDATE:

Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:

select * 
from 
 table 
join 
 -- this just returns 15 random numbers. 
 -- I need `table` here only to produce rows for rand()
 (select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on 
 rnd.val=table.id;

P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.


Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.

The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.

As an alternative to that, you could try using a loop in PHP.

What the loop does is

  • Create a random integer number using rand(), with a scope between 0 and the number of records in the database

  • Query the database whether a record with that ID exists

  • If it exists, add the number to an array

  • If it doesn't, go back to step 1

  • End the loop when the array of random numbers contains the desired number of elements

this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.

The LIMIT method, as outlined by @Luther, is certainly the simplest code-wise.


You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:

shuffle($a);
$a = array_slice($a, 0, 15);


For a large dataset doing

select * from table order by rand() limit 15

can be quite time and memory consuming.

If your data records happen to be numbered you can put and index on the numbering colum and do a

select * from table where no >= rand() limit 15

Or even better do the random number generation in your application and do

select * from table where no >= $rand and no <= $rand+15

If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.


Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like

select * from table where id in (select id from table order by rand() limit 15)
0

精彩评论

暂无评论...
验证码 换一张
取 消