how to lock some row as they don't be selected in other transaction_问答_开发者

how to lock some row as they don't be selected in other transaction

开发者 https://www.devze.com 2023-02-05 13:40 出处：网络

I\'ve a table which is something like a list of URL I want to visit. The table is not referenced nor references other tables.

I've a table which is something like a list of URL I want to visit. The table is not referenced nor references other tables. What my application do is:

select some rows from the list of URL
start a cycle on them
- start a transacion
- visit the url
- elaborate it
- start a sub-transaction
  - check if the results are already in the first two tables (select)
  - if not, save it (insert)
- commit th开发者_运维技巧e sub-transaction
- start a sub-transaction
  - check if the results are already in another table (select)
  - if not, save it (insert)
- commit the sub-transaction
- update the row I'm visiting
- commit the main transaction
end the cycle

There is plenty of error checks here and there, the main transaction has hundreds of queries (select and insert), mysql go very high on CPU (i guess because of big rollback log), but all this is working fine.

Only I can't run more than one instance of this batch because the rows it selects are more or less the same: that means I visit an URL more than once in a few seconds, which I don't want.

If I move the start of the main transaction outside the cycle and select the rows for update, still I don't get multi concurrency because the second instance won't run the select until the main transaction of the first instance won't commit.

A possible solution is to add a "locked" field to the first table to be set to true (actually to current date as I try not to use booleans).

Another is to start the main transaction and then select just one row (for update) at once (setting "limit 1" instead of 5 or 10 as of now).

I cannot imagine other way to get what I want: don't select locked rows.

Any ideas?

It sounds as though you do need some form of marker to identify rows as "in use" so the other instances do not process the same data; whether you use a boolean or date type is irrelevant, somehow you must mark the rows in use.

You can either do this via a dispatcher, a process or thread with sole access to your table and who's only job is to select rows and pass them to other processes to work on. Even then the dispatcher will have to know how far through the data they have got so you are back to the same problem.

Another way is to use a field to indicate the row is in use (as you have said in your question). Each process updates a block of rows with a unique ID, performed inside a transaction to lock the table; I would use the connection number returned from CONNECTION_ID() to mark them, then you know it is unique.

After the UPDATE ... WHERE connection_id IS NULL (with a limit applied) transaction is complete the process can SELECT ... WHERE connection_id = CONNECTION_ID() to get their rows for processing.

When they have completed their work the whole cycle starts again to mark the next set of rows until all have been processed.