Thread-Safety and the Foreign Key Constraint_问答_开发者

Background

Given:

a set of threads

each thread has its own data source

objects in each data source references objects in other data sources

there is a possibility for duplicate objects across various data sources

the threads are writing to a database with an engine that enforces the foreign key constraint

each type of object gets its own table and references the other objects through a foreign key

each thread has its own connection to the database

Proposed Solution

A register class which tracks the ID of the objects that have been written. The inteface of the register class has public methods, thus (represented in Java):

public interface Register
{
    synchronized boolean requestObjectLock(int id);
    synchronized boolean saveFinalized(int id);
    synchronized boolean checkSaved(int id);
}

The method requestObjectLock checks to see if the object has been locked by another thread yet, and returns false it has. Otherwise, it locks that ID and returns true. It is then the responsibility of the calling thread to call saveFinalized when it has been successfully written to the database, and the responsibility of all other threads to check to see whether it has been written already with checkSaved before writing an object that references it. In other words, there are three states an object can be in: unregistered, locked (registered but unwritten), and saved (registered and written).

Reasoning

As far as I know th开发者_如何学运维ere is no way to guarentee that one SQL query will finish before another when called by different threads. Thus, if an object was only registered or unregistered, it seems possible that a thread could check to see if an object was written, start writing an object that referenced it, and have its query complete (and fail) before the query that actually wrote the referenced object did.

Questions

Is it possible to guarantee the sequence of execution of queries being executed by different threads? And therefore, is this solution overengineered? Is there a simpler solution? On the other hand, is it safe?

The terms you need to research on the database side are "transaction isolation level" and "concurrency control". DBMS platform support varies. Some platforms implement a subset of the isolation levels defined in the SQL standards. (The SQL standards allow this. They're written in terms of what behavior isn't allowed.) And different platforms approach concurrency control in different ways.

Wikipedia, although not authoritative, has a good introduction to isolation levels, and also a good introduction to concurrency control.

As far as I know there is no way to guarentee that one SQL query will finish before another when called by different threads.

That's kind of true. It's also kind of not true. In SQL standards, transaction isolation levels aren't concerned with who finishes first. They're concerned with behavior that's not allowed.

dirty read: Transaction A can read data written by concurrent, uncommitted transaction B.

nonrepeatable read: Transaction A reads data twice. A concurrent transaction, B, commits between the two reads. The data transaction A read first is different from the data it read second, because of transaction B. (Some people describe transaction A as seeing "same rows, different column values".)

phantom read: Transaction A reads data twice. A concurrent transaction, B, commits between the two reads. Transaction A's two reads return two different sets of rows, because transaction B has affected the evaluation of transaction A's WHERE clause. (Some people describe transaction A as seeing "same column values, different rows".)

You control transaction behavior in SQL using SET TRANSACTION. So SET TRANSACTION SERIALIZABLE means dirty reads, nonrepeatable reads, and phantom reads are impossible. SET TRANSACTION REPEATABLE READ allows phantom reads, but dirty reads and nonrepeatable reads are impossible.

You'll have to check your platform's documentation to find out what it supports. PostgreSQL, for example, supports all four isolation levels syntactically. But internally it only has two levels: read committed and serializable. That means you can SET TRANSACTION READ UNCOMMITTED, but you'll get "read committed" behavior.

Important for you: The effect of a serializable isolation level is to guarantee that transactions appear to have been issued one at a time by a single client. But that's not quite the same thing as saying that if transaction A starts before transaction B, it will commit before transaction B. If they don't affect each other, the dbms is allowed to commit transaction B first without violating the serializable isolation level semantics.

When I have questions myself about how these work, I test them by opening two command-line clients connected to the same database.