开发者

Dilemma with using value types with `new` operator in C#

开发者 https://www.devze.com 2023-02-22 23:18 出处:网络
When operator new() is used with reference type, space for the instance is allocated on the heap and reference variable itself is placed on the stack. Besides that, everything within the instance of r

When operator new() is used with reference type, space for the instance is allocated on the heap and reference variable itself is placed on the stack. Besides that, everything within the instance of reference type, that is allocated on the heap, is zeroed-out.

For example here is a class:

class Person
{
    public int id;
    public string name;
}

In the following code:

class PersonDemo
{
    static void Main()
    {
        Person p = new Person();
        Console.WriteLine("id: {0}  name: {1}", p.id, p.name);
    }
}

p variable is on the stack and the created instance of Person (all of its memebers) is on the heap. p.id would be 0 and p.name would be null. This would be the case because everything allocated on the heap is zeroed-out.

Now what I'm confused about is if I'm using a value type with new operator. For example, take into consideration following structure:

struct Date
{
    public int year;
    public int month;
    public int day;
}

class DateDemo
{
    static void Main()
    {
        Date someDate;
        someDate= new Date();

        Console.WriteLine("someDate is: {0}/{1}/{2}", 
            someDate.month, someDate.day, someDate.year);
    }
}

Now I would like to know what do the following lines from main do:

        Date someDate;
        someDate= new Date();

In first line someDate variable is allocated on the stack. Precisely 12 bytes.

My question is what happens on the second line? What does operator new() do? Does it only zero-out members of Date structure or it allocates space on the heap as well? On one side I wouldn't expect new to allocate space on the heap, of course because in the first line memory is already allocated on the stack for the structure instance. On the other hand, I would expect new to allocate space on the heap and return address of that space, because that's what new should do. Maybe this is because I'm coming from C++ background.

Nevertheless if the answer is: "when new i开发者_如何学Pythons used with value types, it only zeroes-out members of object", than it's a bit inconsistent meaning of new operator because:

  1. when using new with value types, it only zeroes-out members of object on the stack
  2. when using new with reference types, it allocates memory on the heap for the instance and zerous-out it's members

Thanks in advance,

Cheers


First let me correct your errors.

When operator new() is used with reference type, space for the instance is allocated on the heap and reference variable itself is placed on the stack.

The reference that is the result of "new" is a value, not a variable. The value refers to a storage location.

The reference is of course returned in a CPU register. Whether the contents of that CPU register are ever copied to the call stack is a matter for the jitter's optimizer to decide. It need not ever live on the stack; it could live forever in registers, or it could be copied directly from the register to the managed heap, or, in unsafe code, it could be copied directly to unmanaged memory.

The stack is an implementation detail. You don't know when the stack is being used unless you look at the jitted code.

p variable is on the stack and the created instance of Person (all of its memebers) is on the heap. p.id would be 0 and p.name would be null.

Correct, though of course again p could be realized as a register if the jitter so decides. It need not use the stack if there are available registers.

You seem pretty hung up on this idea that the stack is being used. The jitter might have a large number of registers at its disposal, and those registers can be pretty big.

I'm coming from C++ background.

Ah, that explains why you're so hung up on this stack vs heap thing. Learn to stop worrying about it. We've designed a managed memory environment where things live as long as they need to. Whether the manager chooses to use stack, heap or registers to efficiently manage the memory is up to it.

In first line someDate variable is allocated on the stack. Precisely 12 bytes.

Let's suppose for the sake of argument that this 12 byte structure is allocated on the stack. Seems reasonable.

My question is what happens on the second line? What does operator new() do? Does it only zero-out members of Date structure or it allocates space on the heap as well?

The question presupposes a false dichotomy and is therefore impossible to answer as stated. The question presents two either-or alternatives, neither of which is necessarily correct.

On one side I wouldn't expect new to allocate space on the heap, of course because in the first line memory is already allocated on the stack for the structure instance.

Correct conclusion, specious reasoning. No heap allocation is performed because the compiler knows that no part of this operation requires a long-lived storage. That's what the heap is for; when the compiler determines that a given variable might live longer than the current method activation, it generates code which allocates the storage for that variable on the long-lived "heap" storage. If it determines that the variable definitely has a short lifetime then it uses the stack (or registers), as an optimization.

On the other hand, I would expect new to allocate space on the heap and return address of that space, because that's what new should do.

Incorrect. "new" does not guarantee that heap is allocated. Rather, "new" guarantees that a constructor is called on zeroed-out memory.

Let's go back to your question:

Does it only zero-out members of Date structure or it allocates space on the heap as well?

We know it does not allocate space on the heap. Does it zero out members of the date structure?

That's a complicated question. The specification says that what happens when you say

someDate = new Date();    
  • the address of someDate is determined
  • space is allocated (off "the stack") for the new object. It is zeroed out.
  • then the constructor, if any, is called, with "this" being a reference to the new stack storage
  • then the bytes of the new stack storage are copied to the address of someDate.

Now, is that actually what happens? You would be perfectly within your rights to notice that it is impossible to tell whether new stack space is allocated, initialized and copied, or whether the "old" stack space is initialized.

The answer is that in cases where the compiler deduces that it is impossible for the user to notice that the existing stack space is being mutated, the existing stack space is mutated and the extra allocation and subsequent copy are elided.

In cases where the compiler is unable to deduce that, then a temporary stack slot is created, initialized to zeros, constructed, mutated by the constructor, and then the resulting value is copied to the variable. This ensures that if the constructor throws an exception, you cannot observe an inconsistent state in the variable.

For more details about this issue and its analysis by the compiler see my article on the subject.

https://ericlippert.com/2010/10/11/debunking-another-myth-about-value-types/


OK here is a simple one:

class Program
{
    static void Main(string[] args)
    {
        DateTime dateTime = new DateTime();
        dateTime = new DateTime();
        Console.Read();
    }
}

which compiles to this IL code:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       24 (0x18)
  .maxstack  1
  .locals init ([0] valuetype [mscorlib]System.DateTime dateTime)
  IL_0000:  nop
  IL_0001:  ldloca.s   dateTime
  IL_0003:  initobj    [mscorlib]System.DateTime
  IL_0009:  ldloca.s   dateTime
  IL_000b:  initobj    [mscorlib]System.DateTime
  IL_0011:  call       int32 [mscorlib]System.Console::Read()
  IL_0016:  pop
  IL_0017:  ret
} // end of method Program::Main

As you can see CLR will be using the same local variable to store the new value type although it will run the constructor again - which will most likely just zero the memory. We cannot see what initobj is, this is a CLR implementation.

Reality is, as Eric Lippert explains here, there is no such general rule about value types being allocated on the stack. This is purely down to implementation of the CLR.


The default constructor of a struct returns a struct with all memory zeroed out. That is, new SomeStruct() is the same as default(SomeStruct).

Your code then assigns that default struct to your variable.

That's all you know for sure.

How the compiler goes about achieving this is entirely the compilers business.

But if you're curious about behind the scenes, the compiler is most likely just going to clear the stack location of that variable directly : assuming that variable is stored on the stack. There are many things that can prevent this - one example is an anonymous function accessing it, ie:

Func<Person> PersonFactory()
{
  Person p = new Person();
  return () => p;
}

Here p needs to be stored on the heap to be able to exist once the function returns etc, and so new Person() will clear that heap location.

Anyway. Unlike C/C++, with C# it's a good idea to forget about "the stack", "the heap", etc. AFAIK, the language spec has no concept of either of these - they're all specific to the implementation. Who knows, some future implementation may, where escape analysis allows, put some heap values on the stack to save the GC a bit of effort. It's really best not to make design decisions specific to a given implementation of the C# spec.


From developer point of view you have no knowledge where it is allocated. For example, an exotic device with CLR which has no idea of the stack -> everthing goes on the heap. Even if you consider the desktop CLR, in some cases JITer can move variables from the stack to the heap.

More info.


About zeroing of structs.

Parameterless constructor zeroes members.

If you don't use new(), you can't access struct members unless you initialize them on your own first. Otherwise you'll get "Use of possibly unassigned field".

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号