开发者

SQL (mySQL) update some value in all records processed by a select

开发者 https://www.devze.com 2022-12-17 11:18 出处:网络
I am using mySQL from their C API, but that shouldn\'t be relevant. My code must process records from a table that match some criteria, and then update the said records to flag them as processed. The

I am using mySQL from their C API, but that shouldn't be relevant. My code must process records from a table that match some criteria, and then update the said records to flag them as processed. The lines in the table are modified/inserted/deleted by another process I don't control. I am afraid in the following, the UPDATE might flag some records erroneously since the set of records matching might have changed between step 1 and step 3.

SELECT * FROM myTable WHERE <CONDITION>; # step 1
<iterate over the selected set of lin开发者_JAVA百科es. This may take some time.> # step 2
UPDATE myTable SET processed=1 WHERE <CONDITION> # step 3

What's the smart way to ensure that the UPDATE updates all the lines processed, and only them? A transaction doesn't seem to fit the bill as it doesn't provide isolation of that sort: a recently modified record not in the originally selected set might still be targeted by the UPDATE statement. For the same reason, SELECT ... FOR UPDATE doesn't seem to help, though it sounds promising :-)

The only way I can see is to use a temporary table to memorize the set of rows to be processed, doing something like:

CREATE TEMPORARY TABLE workOrder (jobId INT(11));
INSERT INTO workOrder SELECT myID as jobId FROM myTable WHERE <CONDITION>;
SELECT * FROM myTable WHERE myID IN (SELECT * FROM workOrder);
<iterate over the selected set of lines. This may take some time.>
UPDATE myTable SET processed=1 WHERE myID IN (SELECT * FROM workOrder);
DROP TABLE workOrder;

But this seems wasteful and not very efficient.

Is there anything smarter?

Many thanks from a SQL newbie.


There are several options:

  1. You could lock the table

  2. You could add an AND foo_id IN (all_the_ids_you_processed) as the update condition.

  3. you could update before selecting and then only selecting the updated rows (i.e. by processing date)


I eventually solved this issue by using a column in that table that flags lines according to their status. This column let's me implement a simple state machine. Conceptually, I have two possible values for this status:

kNoProcessingPlanned = 0; #default "idle" value
kProcessingUnderWay = 1;

Now my algorithm does something like this:

UPDATE myTable SET status=kProcessingUnderWay WHERE <CONDITION>; # step 0

SELECT * FROM myTable WHERE status=kProcessingUnderWay; # step 1
    <iterate over the selected set of lines. This may take some time.> # step 2
UPDATE myTable SET processed=1, status=kNoProcessingPlanned WHERE status=kProcessingUnderWay # step 3

This idea of having rows in several states can be extended to as many states as needed.

0

精彩评论

暂无评论...
验证码 换一张
取 消