开发者

How are the "primitive" types defined non-recursively?

开发者 https://www.devze.com 2023-02-06 08:09 出处:网络
Since a struct in C# consists of the bits of its members, you cannot have a value type T which includes any T fields:

Since a struct in C# consists of the bits of its members, you cannot have a value type T which includes any T fields:

// Struct member 'T.m_field' of type 'T' causes a cycle in the struct layout
struct T { T m_field; }

My understanding is that an instance of the above type could never be instantiated*—any attempt to do so would result in an infinite loop of instantiation/allocation (which I guess would cause a stack overflow?**)—or, alternately, another way of looking at it might be that the definition itself just doesn't make sense; perhaps it's a self-defeating entity, sort of like "This statement is false."

Curiously, though, if you run this code:

BindingFlags privateInstance = BindingFlags.NonPublic | BindingFlags.Instance;

// Give me all the private instance fields of the int type.
FieldInfo[] int32Fields = typeof(int).GetFields(privateInstance);

foreach (FieldInfo field in int32Fields)
{
    Console.WriteLine("{0} ({1})", field.Name, field.FieldType);
}

...you will get the following output:

m_value (System.Int32)

It seems we are being "lied" to here***. Obviously I understand that the primitive types like int, double, etc. must be defined in some special way deep down in the bowels of C# (you cannot define every possible unit within a system in ter开发者_如何学JAVAms of that system... can you?—different topic, regardless!); I'm just interested to know what's going on here.

How does the System.Int32 type (for example) actually account for the storage of a 32-bit integer? More generally, how can a value type (as a definition of a kind of value) include a field whose type is itself? It just seems like turtles all the way down.

Black magic?


*On a separate note: is this the right word for a value type ("instantiated")? I feel like it carries "reference-like" connotations; but maybe that's just me. Also, I feel like I may have asked this question before—if so, I forget what people answered.

**Both Martin v. Löwis and Eric Lippert have pointed out that this is neither entirely accurate nor an appropriate perspective on the issue. See their answers for more info.

***OK, I realize nobody's actually lying. I didn't mean to imply that I thought this was false; my suspicion had been that it was somehow an oversimplification. After coming to understand (I think) thecoop's answer, it makes a lot more sense to me.


As far as I know, within a field signature that is stored in an assembly, there are certain hardcoded byte patterns representing the 'core' primitive types - the signed/unsigned integers, and floats (as well as strings, which are reference types and a special case). The CLR knows natively how to deal with those. Check out Partition II, section 23.2.12 of the CLR spec for the bit patterns of the signatures.

Within each primitive struct ([mscorlib]System.Int32, [mscorlib]System.Single etc) in the BCL is a single field of that native type, and because a struct is exactly the same size as its constituent fields, each primitive struct is the same bit pattern as its native type in memory, and so can be interpreted as either, by the CLR, C# compiler, or libraries using those types.

From C#, int, double etc are synonyms of the mscorlib structs, which each have their primitive field of a type that is natively recognised by the CLR.

(There's an extra complication here, in that the CLR spec specifies that any types that have a 'short form' (the native CLR types) always have to be encoded as that short form (int32), rather than valuetype [mscorlib]System.Int32. So the C# compiler knows about the primitive types as well, but I'm not sure of the exact semantics and special-casing that goes on in the C# compiler and CLR for, say, method calls on primitive structs)

So, due to Godel's Incompleteness Theorem, there has to be something 'outside' the system by which it can be defined. This is the Magic that lets the CLR interpret 4 bytes as a native int32 or an instance of [mscorlib]System.Int32, which is aliased from C#.


My understanding is that an instance of the above type could never be instantiated any attempt to do so would result in an infinite loop of instantiation/allocation (which I guess would cause a stack overflow?)—or, alternately, another way of looking at it might be that the definition itself just doesn't make sense;

That's not the best way of characterizing the situation. A better way to look at it is that the size of every struct must be well-defined. An attempt to determine the size of T goes into an infinite loop, and therefore the size of T is not well-defined. Therefore, it's not a legal struct because every struct must have a well-defined size.

It seems we are being lied to here

There's no lie. An int is a struct that contains a field of type int. An int is of known size; it is by definition four bytes. Therefore it is a legal struct, because the size of all its fields is known.

How does the System.Int32 type (for example) actually store a 32-bit integer value

The type doesn't do anything. The type is just an abstract concept. The thing that does the storage is the CLR, and it does so by allocating four bytes of space on the heap, on the stack, or in registers. How else do you suppose a four-byte integer would be stored, if not in four bytes of memory?

how does the System.Type object referenced with typeof(int) present itself as though this value is itself an everyday instance field typed as System.Int32?

That's just an object, written in code like any other object. There's nothing special about it. You call methods on it, it returns more objects, just like every other object in the world. Why do you think there's something special about it?


Three remarks, in addition to thecoop's answer:

  1. Your assertion that recursive structs inherently couldn't work is not entirely correct. It's more like a statement "this statement is true": which is true if it is. It's plausible to have a type T whose only member is of type T: such an instance might consume 0 bytes, for example (since its only member consumes 0 bytes). Recursive value types only stop working if you have a second member (which is why they are disallowed).

  2. Take a look at Mono's definition of Int32. As you can see: it actually is a type containing itself (since int is just an alias for Int32 in C#). There is certainly "black magic" involved (i.e. special-casing), as the comments explain: the runtime will lookup the field by name, and just expect that it's there - I also assume that the C# compiler will special-case the presence of int here.

  3. In PE assemblies, type information is represented through "type signature blobs". These are sequences of type declarations, e.g. for method signatures, but also for fields. The list of available primitive types in such a signature is defined in section 22.1.15 of the CLR specification; a copy of the allowed values is in the CorElementType enumeration. Apparently, the reflection API maps these primitive types to their corresponding System.XYZ valuetypes.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号