My (simplified) table consist of an
Id int identity(1,1),
File varchar(20),
FileProcessed bit
The logic is this: my application takes first (the 开发者_Python百科order isn't important) record, which has FileProcessed bit set to false. Then it processes the file and sets the FileProcessed bit to true.
Now, it can happen, that the first thread takes a record with Id 1 and whilst processing it, another thread takes the same record with Id 1 (because it isn't market as processed).
What is the best way, to support miltithreading in this example?
EDIT: I use SQL Sever 2005 EDIT2: Processing of the file can take a long time, so I don't want to lock the whole table in the meanwhile
Others have mentioned adding an additional column - you might also consider changing the FileProcessed column to be a column called e.g. status - where you could model Unprocessed, Processing, Processed, Faulted? (e.g. what happens if the file cannot be processed).
Also, if processing fails, do you want to retry processing the file immediately. If the processor dies unexpectedly, how are you going to deal with that (e.g you might want another table that describes when each processing attempt starts - and if the last attempt started 20 minutes ago (or whatever seems reasonable), then you might consider that a failed attempt.
To do the selection/update correctly, you might want a script like the following:
declare @FileID int
BEGIN TRANSACTION
select top 1 @FileID = FileID from FilesToDoStuffTo with (updlock,holdlock,readpast) where Status=Unprocessed
update FilesToDoStuffTo set Status = Processing where FileID = @FileID
COMMIT
Then do whatever you need to do with the @FileID you've selected.
You need to wrap your application logic in TransactionScope
sections. This way each call to the DB is in a transaction of itself.
To be certain that a call is in fact locking, use a constructor that takes a TransactionScopeOption
, specifically the Required
option, so transactions will always be in place.
This may well have impact on performance, so you will need to test that.
You can't rely just on changing a field in the database, as you may get dirty reads (one thread read that the record is not "in use", another one marks it as "in use" and the first still tries to work with it).
So a combination of transaction support and a field in the database will work best.
Change the table structure to:
Id int identity(1,1),
File varchar(20),
FileOnProcess bit,
FileProcessed bit
Now, you can just lock the row and update the FileOnProcess bit so that you only select files that are not already being process. Depending on your database engine the actual SQL command to lock might differ.
(Reposting comment as an answer, as requested by OP.)
I think Damien_The_Unbeliever's answer is the best general approach when using an RDMS (such as SQL Server) that supports it, except (as I mentioned in my comment to him), I'd include a column identifying the instance processing the row.
If the RDBMS or environment you're using makes the above difficult, and if you give each instance its own unique ID, you can get a similar effect by having a column that's normally NULL that you can set to the instance processing the row. Then (even without transactions) you can do
set rowcount 1
update FilesToDoStuffTo
set BeingProcessedBy = {theid}
where FileProcessed = 0 and BeingProcessedBy is NULL
...then
select FileID from FilesToDoStuffTo where BeingProcessedBy = {theid}
(Usually your DB connection infrastructure -- ODBC, JDBC, whatever -- has a wrapper for the set rowcount 1
; basically, you want to be sure to only update one row, not all of them! Or grab in batches of 5, 10, whatever makes sense for what you're doing.)
Better to avoid that sort of game if you can (by using transactions and/or stored procedures to at least select the row, row locks, etc.), but sometimes a bog standard approach is the most practical. :-)
Add a new flag called processing, or jst mark the record as processed when the first thread picks it up, depending on your requirements.
Also be aware that when pulling the record and marking it as being dealt with, you should lock the table, as it is possible you could read the record, then before marking it as processed / processing another thread could come in and take it as well.
精彩评论