How to distribute rows to concurrent processes in the order defined on DB?_问答_开发者

How to distribute rows to concurrent processes in the order defined on DB?

开发者 https://www.devze.com 2023-03-27 21:45 出处：网络

I do have a DB table, which is kind of spool for performing tasks: id | status | owner | param1 | +----+--------+-------+--------+

I do have a DB table, which is kind of spool for performing tasks:

| id | status | owner | param1 |
+----+--------+-------+--------+
|  1 | used   | user1 | AAA1   |
|  2 | free   | user2 | AAA2   |
|  3 | free   | user1 | AAA3   |
|  4 | free   | user1 | AAA4   |
|  5 | free   | user3 | AAA2   |

This table is being access by many parallel processes, what would be the best way to assure, that each row from the table would be "used" just by single process but also at the same time given out in the same order as they are in table (sorted by id column value)?

My first idea was to simply mark always next row in queue with simple update:

UPDATE table 
    SET status = "used" 
WHERE owner = "userX" 
    AND status <> "used" 
ORDER BY id 
LIMIT 1

and then fetch the marked row.

This was not performing at all - with some data (e.g. 3.000.000 rows) and bigger loads process list was full UPDATE statements and mysql crashed with "Out of sort memory" error...

So my next idea is doing following steps/queries:

step1

get the first unused row:

SELECT id 
FROM table 
WHERE owner = "userX"
    AND status = "free" 
ORDER BY id 
LIMIT 1

step2

try to mark it as used if it is still free:

UPDATE table 
    SET status = "used" 
WHERE id = <id from SELECT above开发者_如何学运维> 
    AND status = "free"

step3

go to step1 if row was NOT updated (because some other process already used it) or go to step4 if row was updated

step4

do the required work with successfully found row

The disadvantage is that on many concurrent processes there will be always a lot of jumping between steps 1. and 2. till each process finds its "own" row. So to be sure that system works stable - I would need to limit the number of tries each process does and risk that processes may reach the limit and find nothing while there are still entries in the table.

Maybe there is some better way to solve this problem?

P.S. everything is done at the moment with PHP+MySQL

Just a suggestion, instead of sorting and limiting to 1 maybe just grab min(id):

SELECT MIN(id) 
FROM table 
WHERE owner = "userX"
    AND status = "free"

I am also using a MySQL database to choose rows that need to be enqueued for lengthy processing, preferring to do them in the order of the primary index ID column, also using optimistic concurrency control as shown above (no transactions needed). Thank you to @sleeperson for the answer using min(id), that is far superior to order by / limit 1.

I am posting one additional suggestion that allows for graceful restart. I implemented the following that's done only at startup time:

step0

get lingering rows:

SELECT id 
FROM table 
WHERE owner = "userX"
    AND status = "used"

call step4

Etc. After a crash or other unwelcome (yet oh so common event) this will distribute out for processing the rows that should have been done previously, instead of leaving them marked 'used' in the database to trip me up later.