I am confused by a code listing in a book i am reading, C# 3 in a Nutshell, on threading. In the topic on Thread Safety in Application Servers, below code is given as an example of a UserCache:
static class UserCache
{
static Dictionary< int,User> _users = new Dictionary< int, User>();
internal static User GetUser(int id)
{
User u = null;
lock (_users) // Why lock this???
if (_users.TryGetValue(id, out u))
return u;
u = RetrieveUser(id); //Method to retrieve from databse
lock (_users) _users[id] = u; //Why lock this???
return u;
}
}
The authors exp开发者_StackOverflow中文版lain why the RetrieveUser method is not in a lock, this is to avoid locking the cache for a longer period.
I am confused as to why lock the TryGetValue and the update of the dictionary since even with the above the dictionary is being updated twice if 2 threads call simultaneously with the same unretrieved id.What is being achieved by locking the dictionary read?
Many thanks in advance for all your comments and insights.The Dictionary<TKey, TValue>
class is not threadsafe.
If one thread writes one key to the dictionary while a different thread reads the dictionary, it may get messed up. (For example, if the write operation triggers an array resize, or if the two keys are a hash collision)
Therefore, the code uses a lock to prevent concurrent writes.
There is a benign race condition when writing to the dictionary; it is possible, as you stated, for two threads to determine there is not a matching entry in the cache. In this case, both of them will read from the DB and then attempt to insert. Only the object inserted by the last thread is kept; the other object will be garbage collected when the first thread is done with it.
The read to the dictionary needs to be locked because another thread may be writing at the same time, and the read needs to search over a consistent structure.
Note that the ConcurrentDictionary
introduced in .NET 4.0 pretty much replaces this kind of idiom.
That's a common practice to access any non thread safe structures like lists, dictionaries, common shared values, etc.
And answering main question: locking a read we guarantee that dictionary will not be changed by another thread while we are reading its value. This is not implemented in dictionary and that is why it’s called non thread safe :)
If two threads call in simultaneously and the id exists, then they will both return the correct User information. The first lock is to prevent errors like SLaks said - if someone is writing to the dictionary while you are trying to read it, you'll have issues. In this scenario, the second lock will never be reached.
If two threads call in simultaneously and the id does not exist, one thread will lock and enter TryGetValue, this will return false and set u to a default value. This first lock is again, to prevent the errors described by SLaks. At this point, that first thread will release the lock and the second thread will enter and do the same. Both will then set 'u' to information from 'RetrieveUser(id)'; this should be the same information. One thread will then lock the dictionary and assign _users[id] to the value of u. This second lock is so that two threads are trying to write values to the same memory locations simultaneously and corrupting that memory. I don't know what the second thread will do when it enters the assignment. It will either return early ignoring the update, or overwrite the existing data from the first thread. Regardless, the Dictionary will contain the same information because both threads should have recieved the same data in 'u' from RetrieveUser.
For performance, the auther compared two scenarios - the above scenario, which will be extremely rare and block while two threads try and write the same data, and second one where it is far more likely that two threads call in requesting data for an object that needs written, and one that exists. For example, threadA and threadB call in simultaneously and ThreadA locks for an id that doesn't exist. There is no reason to make threadB wait for a lookup while threadA is working on RetriveUser. This situation is probably far more likely than the duplicate ids described above, so for performance the author chose not to lock on the whole block.
精彩评论