I understand that this is an implementation detail. I'm actually curious what that implementation detail is in Microsoft's CLR.
Now, bear with me as I did not study CS in college, so I might have missed out on some fundamental principles.
But my understanding of the "stack" and the "heap" as implemented in the CLR as it stands today is, I think, solid. I'm not going to make some inaccurate umbrella statement such as "value types are stored on the stack," for example. But, in most common scenarios -- plain vanilla local variables, of value type, either passed as parameters or declared within the method and not contained inside a closure -- value type variables are stored on the stack (again, in Microsoft's CLR).
I guess what I开发者_运维技巧'm unsure of is where ref
value type parameters come in.
Originally what I was thinking was that, if the call stack looks like this (left = bottom):
A() -> B() -> C()
...then a local variable declared within the scope of A and passed as a ref
parameter to B could still be stored on the stack--couldn't it? B would simply need the memory location where that local variable was stored within A's frame (forgive me if that isn't the right terminology; I think it's clear what I mean, anyway).
I realized this couldn't be strictly true, though, when it occurred to me that I could do this:
delegate void RefAction<T>(ref T arg);
void A()
{
int x = 100;
RefAction<int> b = B;
// This is a non-blocking call; A will return immediately
// after this.
b.BeginInvoke(ref x, C, null);
}
void B(ref int arg)
{
// Putting a sleep here to ensure that A has exited by the time
// the next line gets executed.
Thread.Sleep(1000);
// Where is arg stored right now? The "x" variable
// from the "A" method should be out of scope... but its value
// must somehow be known here for this code to make any sense.
arg += 1;
}
void C(IAsyncResult result)
{
var asyncResult = (AsyncResult)result;
var action = (RefAction<int>)asyncResult.AsyncDelegate;
int output = 0;
// This variable originally came from A... but then
// A returned, it got updated by B, and now it's still here.
action.EndInvoke(ref output, result);
// ...and this prints "101" as expected (?).
Console.WriteLine(output);
}
So in the example above, where is x
(in A's scope) stored? And how does this work? Is it boxed? If not, is it subject to garbage collection now, despite being a value type? Or can the memory immediately be reclaimed?
I apologize for the long-winded question. But even if the answer is quite simple, maybe this will be informative to others who find themselves wondering the same thing in the future.
I don't believe that when you use BeginInvoke()
and EndInvoke()
with ref
or out
arguments you are truly passing the variables by ref. The fact that we have to call EndInvoke()
with a ref
parameter as well should be a clue to this.
Let's change your example to demonstrate the behavior I describe:
void A()
{
int x = 100;
int z = 400;
RefAction<int> b = B;
//b.BeginInvoke(ref x, C, null);
var ar = b.BeginInvoke(ref x, null, null);
b.EndInvoke(ref z, ar);
Console.WriteLine(x); // outputs '100'
Console.WriteLine(z); // outputs '101'
}
If you examine the output now, you will see that the value of x
is actually unchanged. But z
does now contain the update value.
I suspect that the compiler alters the semantics of passing variables by ref
when you use the asynchronous Begin/EndInvoke methods.
After taking a look at the IL produced by this code, it appears that ref
arguments to BeginInvoke()
are still passed by ref
. While Reflector doesn't show the IL for this method, I suspect that it simply doesn't pass along the parameter as a ref
argument, but instead creates a separate variable behind the scenes to pass to B()
. When you then call EndInvoke()
you must supply a ref
argument again to retrieve the value from the async state. It's likely that such arguments are actually stored as part of (or in conjunction with) the IAsyncResult
object which is needed to ultimately retrieve their values.
Let's think about why the behavior likely works this way. When you make an async call to a method, you are doing so on a separate thread. This thread has its own stack and so cannot use the typical mechanism of aliasing ref/out
variables. However, in order to get any returned values from an async method, you need to eventually call EndInvoke()
to complete the operation and retrieve these values. However, the call to EndInvoke()
could just as easily occur on a completely different thread than the original call to BeginInvoke()
or the actual body of the method. Clearly the call stack is not a good place to store such data - especially since the thread used for the async call could be re-purposed for a different method once the async operation completes. As a result, some mechanism other than the stack is needed to "marshal" the return value and out/ref arguments from the method being called back to the site where they will ultimately be used.
I believe this mechanism (in the Microsoft .NET implementation) is the IAsyncResult
object. In fact, if you examine the IAsyncResult
object in the debugger, you will notice that in the non-public members there exists _replyMsg
, which contains a Properties
collection. This collection contains elements like __OutArgs
and __Return
whose data appear to reflect their namesakes.
EDIT: Here's a theory about the async delegate design, that occurs to me. It seems likely that the signatures of BeginInvoke()
and EndInvoke()
were chosen to be as similar as possible to each other to avoid confusion and improve clarity. The BeginInvoke()
method doesn't actually need to accept ref/out
arguments - since it only needs their value ... not their identify (as it's never going to assign anything back to them). However it would be really odd (for example) to have a BeginInvoke()
call that takes an int
and an EndInvoke()
call that takes a ref int
. Now, it's possible that there are technical reasons why begin/end calls should have identical signatures - but I think that the benefits of clarity and symmetry are sufficient to validate such a design.
All of this is, of course, an implementation detail of the CLR and C# compiler and could change in the future. It is interesting, however, that there is the possibility for confusion - if you expect that the original variable passed to BeginInvoke()
will actually be modified. It also underscores the importance of calling EndInvoke()
to complete an async operation.
Perhaps someone from the C# team (if they see this question) could offer more insight into the details and design choices behind this functionality.
The CLR is completely out of the loop on this, it is the job of the JIT compiler to generate the appropriate machine code to get an argument passed by reference. Which is an implementation detail in itself, there are different jitters for different machine architectures.
But the common ones do it exactly the way a C programmer does it, they pass a pointer to the variable. That pointer is passed in a CPU register or on the stack frame, depending on how many arguments the method takes.
Where the variable lives doesn't matter, a pointer to a variable in the stack frame of the caller is just as valid as a pointer to member of a reference type object that's stored on the heap. The garbage collector knows the difference between them, by virtue of the pointer value, adjusting the pointer if necessary when it moves an object.
Your code snippet invokes magic inside the .NET framework that's required to make marshaling calls from one thread to another work. This is the same kind of plumbing that makes Remoting works. To make such a call, a new stack frame has to be created on the thread where the call is performed. The remoting code uses the type definition of the delegate to know what that stack frame should look like. And it can deal with arguments passed by reference, it knows that it needs to allocate a slot in the stack frame to store the pointed-to variable, i in your case. The BeginInvoke call initializes the copy of the i variable in the remoted stack frame.
The same thing happens on the EndInvoke() call, the results are copied back from the stack frame in the threadpool thread. Key point is that there isn't actually a pointer to the i variable, there's a pointer to the copy of it.
Not so sure this answer is very clear, having some understanding of how CPUs work and a bit of C knowledge so the concept of a pointer is crystal can help a lot.
Look at the code generated with reflector to find out. My guess is that an anonymous class containing x is generated, like when you use closures (lambda expressions that reference variables in the current stack frame). Forget about this and read the other answers.
精彩评论