Thread-safe asynchronous code in C#_问答_开发者

I asked the question below couple of weeks ago. Now, when reviewing my question and all the answers, a very important detail jumped into my eyes: In my second code example, isn't DoTheCodeThatNeedsToRunAsynchronously() executed in the main (UI) thread? Doesn't the timer just wait a second and then post an event to the main thread? This would mean then that the code-that-needs-to-run-asynchronously isn't run asynchronously at all?!

Original question:

I have recently faced a problem multiple times and solved it in different ways, always being uncertain on whether it is thread safe or not: I need to execute a piece of C# code asynchronously. (Edit: I forgot to mention I'm using .NET 3.5!)

That piece of code works on an object that is provided by the main thread code. (Edit: Let's assume that object is thread-safe in itself.) I'll present you two ways I tried (simplified) and have these four questions:

What is the best way to achieve what I want? Is it one of the two or another approach?
Is one of the two ways not thread-safe (I fear both...) and why?
The first approach creates a thread and passes it the object in the constructor. Is that how I'm supposed to pass the object?
The second approach uses a timer which doesn't provide that possibility, so I just use the local variable in the anonymous delegate. Is that safe or is it possible in theory that the refer开发者_如何学Goence in the variable changes before it is evaluated by the delegate code? (This is a very generic question whenever one uses anonymous delegates). In Java you are forced to declare the local variable as final (i.e. it cannot be changed once assigned). In C# there is no such possibility, is there?

Approach 1: Thread

new Thread(new ParameterizedThreadStart(
    delegate(object parameter)
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)

        MyObject myObject = (MyObject)parameter;

        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();

    })).Start(this.MyObject);

There is one problem I had with this approach: My main thread might crash, but the process still persists in the memory due to the zombie thread.

Approach 2: Timer

MyObject myObject = this.MyObject;

System.Timers.Timer timer = new System.Timers.Timer();
timer.Interval = 1000;
timer.AutoReset = false; // i.e. only run the timer once.
timer.Elapsed += new System.Timers.ElapsedEventHandler(
    delegate(object sender, System.Timers.ElapsedEventArgs e)
    {
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
    });

DoSomeStuff();
myObject = that.MyObject; // hypothetical second assignment.

The local variable myObject is what I'm talking about in question 4. I've added a second assignment as an example. Imagine the timer elapses after the second assigment, will the delegate code operate on this.MyObject or that.MyObject?

Whether or not either of these pieces of code is safe has to do with the structure of MyObject instances. In both cases you are sharing the myObject variable between the foreground and background threads. There is nothing stopping the foreground thread from modifying myObject while the background thread is running.

This may or may not be safe and depends on the structure of MyObject. However if you haven't specifically planned for it then it's most certainly an unsafe operation.

I recommend using Task objects, and restructuring the code so that the background task returns its calculated value rather than changing some shared state.

I have a blog entry that discusses five different approaches to background tasks (Task, BackgroundWorker, Delegate.BeginInvoke, ThreadPool.QueueUserWorkItem, and Thread), with the pros and cons of each.

To answer your questions specifically:

What is the best way to achieve what I want? Is it one of the two or another approach? The best solution is to use the Task object instead of a specific Thread or timer callback. See my blog post for all the reasons why, but in summary: Task supports returning a result, callbacks on completion, proper error handling, and integration with the universal cancellation system in .NET.
Is one of the two ways not thread-safe (I fear both...) and why? As others have stated, this totally depends on whether MyObject.ChangeSomeProperty is threadsafe. When dealing with asynchronous systems, it's easier to reason about threadsafety when each asynchronous operation does not change shared state, and rather returns a result.
The first approach creates a thread and passes it the object in the constructor. Is that how I'm supposed to pass the object? Personally, I prefer using lambda binding, which is more type-safe (no casting necessary).
The second approach uses a timer which doesn't provide that possibility, so I just use the local variable in the anonymous delegate. Is that safe or is it possible in theory that the reference in the variable changes before it is evaluated by the delegate code? Lambdas (and delegate expressions) bind to variables, not to values, so the answer is yes: the reference may change before it is used by the delegate. If the reference may change, then the usual solution is to create a separate local variable that is only used by the lambda expression,

as such:

MyObject myObject = this.MyObject;
...
timer.AutoReset = false; // i.e. only run the timer once.
var localMyObject = myObject; // copy for lambda
timer.Elapsed += new System.Timers.ElapsedEventHandler(
  delegate(object sender, System.Timers.ElapsedEventArgs e)
  {
    DoTheCodeThatNeedsToRunAsynchronously();
    localMyObject.ChangeSomeProperty();
  });
// Now myObject can change without affecting timer.Elapsed

Tools like ReSharper will try to detect whether local variables bound in lambdas may change, and will warn you if it detects this situation.

My recommended solution (using Task) would look something like this:

var ui = TaskScheduler.FromCurrentSynchronizationContext();
var localMyObject = this.myObject;
Task.Factory.StartNew(() =>
{
  // Run asynchronously on a ThreadPool thread.
  Thread.Sleep(1000); // TODO: review if you *really* need this   

  return DoTheCodeThatNeedsToRunAsynchronously();   
}).ContinueWith(task =>
{
  // Run on the UI thread when the ThreadPool thread returns a result.
  if (task.IsFaulted)
  {
    // Do some error handling with task.Exception
  }
  else
  {
    localMyObject.ChangeSomeProperty(task.Result);
  }
}, ui);

Note that since the UI thread is the one calling MyObject.ChangeSomeProperty, that method doesn't have to be threadsafe. Of course, DoTheCodeThatNeedsToRunAsynchronously still does need to be threadsafe.

"Thread-safe" is a tricky beast. With both of your approches, the problem is that the "MyObject" your thread is using may be modified/read by multiple threads in a way that makes the state appear inconsistent, or makes your thread behave in a way inconsistent with actual state.

For example, say your MyObject.ChangeSomeproperty() MUST be called before MyObject.DoSomethingElse(), or it throws. With either of your approaches, there is nothing to stop any other thread from calling DoSomethingElse() before the thread that will call ChangeSomeProperty() finishes.

Or, if ChangeSomeProperty() happens to be called by two threads, and it (internally) changes state, the thread context switch may happen while the first thread is in the middle of it's work and the end result is that the actual new state after both threads is "wrong".

However, by itself, neither of your approaches is inherently thread-unsafe, they just need to make sure that changing state is serialized and that accessing state is always giving a consistent result.

Personally, I wouldn't use the second approach. If you're having problems with "zombie" threads, set IsBackground to true on the thread.

Your first attempt is pretty good, but the thread continued to exist even after the application exits, because you didn't set the IsBackground property to true... here is a simplified (and improved) version of your code:

MyObject myObject = this.MyObject;
Thread t = new Thread(()=>
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
    });
t.IsBackground = true;
t.Start();

With regards to the thread safety: it's difficult to tell if your program functions correctly when multiple threads execute simultaneously, because you're not showing us any points of contention in your example. It's very possible that you will experience concurrency issues if your program has contention on MyObject.

Java has the final keyword and C# has a corresponding keyword called readonly, but neither final nor readonly ensure that the state of the object you're modifying will be consistent between threads. The only thing these keywords do is ensure that you do not change the reference the object is pointing to. If two threads have read/write contention on the same object, then you should perform some type of synchronization or atomic operations on that object in order to ensure thread safety.

Update

OK, if you modify the reference to which myObject is pointing to, then your contention is now on myObject. I'm sure that my answer will not match your actual situation 100%, but given the example code you've provided I can tell you what will happen:

You will not be guaranteed which object gets modified: it can be that.MyObject or this.MyObject. That's true regardless if you're working with Java or C#. The scheduler may schedule your thread/timer to be executed before, after or during the second assignment. If you're counting on a specific order of execution, then you have to do something to ensure that order of execution. Usually that something is a communication between the threads in the form of a signal: a ManualResetEvent, Join or something else.

Here is a join example:

MyObject myObject = this.MyObject;
Thread task = new Thread(()=>
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
    });
task.IsBackground = true;
task.Start();
task.Join(); // blocks the main thread until the task thread is finished
myObject = that.MyObject; // the assignment will happen after the task is complete

Here is a ManualResetEvent example:

ManualResetEvent done = new ManualResetEvent(false);
MyObject myObject = this.MyObject;
Thread task = new Thread(()=>
    {
        Thread.Sleep(1000); // wait a second (for a specific reason)
        DoTheCodeThatNeedsToRunAsynchronously();
        myObject.ChangeSomeProperty();
        done.Set();
    });
task.IsBackground = true;
task.Start();
done.WaitOne(); // blocks the main thread until the task thread signals it's done
myObject = that.MyObject; // the assignment will happen after the task is done

Of course, in this case it's pointless to even spawn multiple threads, since you're not going to allow them to run concurrently. One way to avoid this is by not changing the reference to myObject after you've started the thread, then you won't need to Join or WaitOne on the ManualResetEvent.

So this leads me to a question: why are you assigning a new object to myObject? Is this a part of a for-loop which is starting multiple threads to perform multiple asynchronous tasks?

What is the best way to achieve what I want? Is it one of the two or another approach?

Both look fine, but...

Is one of the two ways not thread-safe (I fear both...) and why?

...they are not thread safe unless MyObject.ChangeSomeProperty() is thread safe.

The first approach creates a thread and passes it the object in the constructor. Is that how I'm supposed to pass the object?

Yes. Using a closure (as in your second approach) is fine as well, with the additional advantage that you don't need to do a cast.

The second approach uses a timer which doesn't provide that possibility, so I just use the local variable in the anonymous delegate. Is that safe or is it possible in theory that the reference in the variable changes before it is evaluated by the delegate code? (This is a very generic question whenever one uses anonymous delegates).

Sure, if you add myObject = null; directly after setting timer.Elapsed, then the code in your thread will fail. But why would you want to do that? Note that changing this.MyObject will not affect the variable captured in your thread.

So, how to make this thread-safe? The problem is that myObject.ChangeSomeProperty(); might run in parallel with some other code that modifies the state of myObject. There are basically two solutions to that:

Option 1: Execute myObject.ChangeSomeProperty() in the main UI thead. This is the simplest solution if ChangeSomeProperty is fast. You can use the Dispatcher (WPF) or Control.Invoke (WinForms) to jump back to the UI thread, but the easiest way is to use a BackgroundWorker:

MyObject myObject = this.MyObject;
var bw = new BackgroundWorker();

bw.DoWork += (sender, args) => {
    // this will happen in a separate thread
    Thread.Sleep(1000);
    DoTheCodeThatNeedsToRunAsynchronously();
}

bw.RunWorkerCompleted += (sender, args) => {
    // We are back in the UI thread here.

    if (args.Error != null)  // if an exception occurred during DoWork,
        MessageBox.Show(args.Error.ToString());  // do your error handling here
    else
        myObject.ChangeSomeProperty();
}

bw.RunWorkerAsync(); // start the background worker

Option 2: Make the code in ChangeSomeProperty() thread-safe by using the lock keyword (inside ChangeSomeProperty as well as inside any other method modifying or reading the same backing field).

The bigger thread-safety concern here, in my mind, may be the 1 second Sleep. If this is required in order to synchronize with some other operation (giving it time to complete), then I strongly recommend using a proper synchronization pattern rather than relying on the Sleep. Monitor.Pulse or AutoResetEvent are two common ways to achieve synchronization. Both should be used carefully, as it's easy to introduce subtle race conditions. However, using Sleep for synchronization is a race condition waiting to happen.

Also, if you want to use a thread (and don't have access to the Task Parallel Library in .NET 4.0), then ThreadPool.QueueUserWorkItem is preferable for short-running tasks. The thread pool threads also won't hang up the application if it dies, as long as there is not some deadlock preventing a non-background thread from dying.

One thing not mentioned so far: The choice of threading methods depends heavily on specifically what DoTheCodeThatNeedsToRunAsynchronously() does.

Different .NET threading approaches are suitable for different requirements. One very large concern is whether this method will complete quickly, or take some time (is it short-lived or long-running?).

Some .NET threading mechanisms, like ThreadPool.QueueUserWorkItem(), are for use by short-lived threads. They avoid the expense of creating a thread by using "recycled" threads--but the number of threads it will recycle is limited, so a long-running task shouldn't hog the ThreadPool's threads.

Other options to consider are using:

ThreadPool.QueueUserWorkItem() is a convienient means to fire-and-forget small tasks on a ThreadPool thread
System.Threading.Tasks.Task is a new feature in .NET 4 which makes small tasks easy to run in async/parallel mode.
Delegate.BeginInvoke() and Delegate.EndInvoke() (BeginInvoke() will run the code asynchronously, but it's crucial that you ensure EndInvoke() is called as well to avoid potential resource-leaks. It's also based on ThreadPool threads I believe.
System.Threading.Thread as shown in your example. Threads provide the most control but are also more expensive than the other methods--so they are ideal for long-running tasks or detail-oriented multithreading.

Overall my personal preference has been to use Delegate.BeginInvoke()/EndInvoke() -- it seems to strike a good balance between control and ease of use.