I do have a DB table, which is kind of spool for performing tasks:
| id | status | owner | param1 |
+----+--------+-------+--------+
| 1 | used | user1 | AAA1 |
| 2 | free | user2 | AAA2 |
| 3 | free | user1 | AAA3 |
| 4 | free | user1 | AAA4 |
| 5 | free | user3 | AAA2 |
This table is being access by many parallel processes, what would be the best way to assure, that each row from the table would be "used" just by single process but also at the same time given out in the same order as they are in table (sorted by id
column value)?
My first idea was to simply mark always next row in queue with simple update:
UPDATE table
SET status = "used"
WHERE owner = "userX"
AND status <> "used"
ORDER BY id
LIMIT 1
and then fetch the marked row.
This was not performing at all - with some data (e.g. 3.000.000 rows) and bigger loads process list was full UPDATE statements and mysql crashed with "Out of sort memory" error...
So my next idea is doing following steps/queries:
step1
get the first unused row:
SELECT id
FROM table
WHERE owner = "userX"
AND status = "free"
ORDER BY id
LIMIT 1
step2
try to mark it as used if it is still free:
UPDATE table
SET status = "used"
WHERE id = <id from SELECT above开发者_如何学运维>
AND status = "free"
step3
go to step1 if row was NOT updated (because some other process already used it) or go to step4 if row was updated
step4
do the required work with successfully found row
The disadvantage is that on many concurrent processes there will be always a lot of jumping between steps 1.
and 2.
till each process finds its "own" row. So to be sure that system works stable - I would need to limit the number of tries each process does and risk that processes may reach the limit and find nothing while there are still entries in the table.
Maybe there is some better way to solve this problem?
P.S. everything is done at the moment with PHP+MySQL
Just a suggestion, instead of sorting and limiting to 1 maybe just grab min(id):
SELECT MIN(id)
FROM table
WHERE owner = "userX"
AND status = "free"
I am also using a MySQL database to choose rows that need to be enqueued for lengthy processing, preferring to do them in the order of the primary index ID column, also using optimistic concurrency control as shown above (no transactions needed). Thank you to @sleeperson for the answer using min(id), that is far superior to order by / limit 1
.
I am posting one additional suggestion that allows for graceful restart. I implemented the following that's done only at startup time:
step0
get lingering rows:
SELECT id
FROM table
WHERE owner = "userX"
AND status = "used"
call step4
Etc. After a crash or other unwelcome (yet oh so common event) this will distribute out for processing the rows that should have been done previously, instead of leaving them marked 'used' in the database to trip me up later.
精彩评论