what does union U look like in the memory?_问答_开发者

enum Enums { k1, k2, k3, k4 };

union MYUnion { 
    struct U{ 
         char P;
    }u;

    struct U0 { 
        char state; 
    } u0; 

    struct U1 { 
        Enums e; 
        char c; 
        int v1; 
    } u1; 

    struct U2 { 
        Enums e; 
        char c; 
        int v1; 
        int v2; 
    } u2; 

    struct U3 { 
        Enums e; 
        unsigned int i; 
        char c; 
    } u3; 

    struct U4 { 
        Enums e;
        unsigned int i; 
        char c; 
        int v1; 
    } u4; 

    struct U5 { 
        Enums e; 
        unsigned int i; 
        char c; 
        int v1; 
        int v2; 
    } u5; 
} myUnion

I'm so confused with this whole idea of Union in C++. What does this "myUnion" look like in memory?? I know that the data share the same memor开发者_如何学Pythony block, but how? What is the size of "myUnion"? If it is the size of "u5" then how is the data allocated in this block of memory??

the size of the union is the size of the largest thing in the union.
the layout of the union is whatever you last stored.

So, in your last union, if you store into .i, and then store into .e the first byte of the int will be overwritten with the enum value (assuming that sizeof (enum) is 1 on your environment).

A union is like:

void * p = malloc(sizeof(biggest_item))
Enums *ep = (Enums *)e;
unsigned int * ip = (unsigned int *)p;
char *cp = (char *)p;

Assignments to *ep, *ip, and *cp work just like the union.

You are right to be confused! I'm confused... Let's look at something simple first before moving on to the more complex example you have.

First union basics. A union just means that when you create a variable of the union type, the underlying components (in the example below i and f) are really overlapping in memory. It lets you sometimes treat that memory as an int and sometimes treat that memory as a float. This naturally can be nasty and you really have to know what you're doing.

union AUnion
{
   int i;
   float f;
}; // assumes an int is 32 bits

AUnion aUnion;
aUnion.i = 0;
printf("%f", aUnion.f);

In the above code, what will be print out? Well to understand the answer to that question you have to understand how ints and floats are represented in memory. Both take up 32 bits of memory. How that memory is interpreted however differs between the two types. When I set aUnion.i = 0, I am saying "write a 0'd integer to aUnion". A 0'd integer, it so happens, corresponds to setting all 32-bits to 0. Now when we go to print aUnion.f, we are saying "treat aUnion as if the bits are really a 32-bit float, and print it out! The computer then treats all those underlying bits as if they are really parts of a float instead of the int. The computer knows how to treat any random bunch of 32-bits as a float because it knows how a floating point number is formatted in binary.

Now to take on some of your more complex union code:

enum Enums { k1, k2, k3, k4 };

union MYUnion { 
struct U{ 
     char P;
}u;

struct U0 { 
    char state; 
} u0; 

struct U1 { 
    Enums e; 
    char c; 
    int v1; 
} u1;

All these structs are overlapped in the same way the int and float were above. Now if we assume that Enums are mapped to an int. Then we can map the enums to int values in the underlying memory, based on the rules of enums:

 enum Enums { k1/*0*/, k2/*1*/, k3/*2*/, k4/*3*/ };

So then what we have is

union MYUnion { 
struct U{ 
     char P;
}u;

struct U0 { 
    char state; 
} u0; 

struct U1 { 
    int e; 
    char c; 
    int v1; 
} u1;

And you have a very strange union because if you do

MyUnion m;
m.u.P = 'h'

When later you access the enum (which is most likely an int beneath the hoods), it will be read as an invalid value. This is because P is just 1 byte, and the int is 4 bytes. When read as the enum, you will get weird results.

I highly suggest you go sack who is responsible for this code.

As Murali said, the size of the union will be that of the largest struct that participates in the union.

The app will allocate enough bytes for the largest block. Memory mapping works like this:

Consider the following:

union Foo
{
  struct A
  {
    int x;
    unsigned char y;
    unsigned char z;

  }
  struct B
  {
    unsigned char a;
    unsigned char b;
    unsigned char c;
    unsigned char d;
    unsigned char e;
  }
}

In this case, assuming that int is 32 bits (which depends on your target platform), a,b,c and d provide access to the bytes that make up the integer X. Writing to A will overwrite the first byte of x, b will overwrite the second byte of x, and so forth.

Conversely, writing a value to X will affect a,b,c and d.

unsigned chars y and e occupy the same space (again, depending on the fact that int is 32 bits) so .y and .e are effectively aliases for each other.

The unsigned char A.z does not overlap any element of struct B, so it is effectively immune to changes to B.

The point here is that the elements of the unioned structs occupy the same memory. The different structs provide different ways to read and write the same memory by letting you use different datatypes.

And it depends on the memory model (big-little)endian, etc. of your target CPU to know what the block of memory will look like, but in general each struct will start a the same address and layout over the same area -- useful if you know what it is going to do, a shot to the foot otherwise.

Probably because unions aren't a great feature. They might be useful for memory sharing where you are extremely constrained. Or unportable memory tricks(writing into variables of one size and reading from another). Or for emulating nice tagged unions.

The tagged union pattern works like this.

enum mytags { FULL_TIME_WORKER, PART_TIMER, CONTRACTOR}

struct worker_t {
    enum mytags tag,
    union {
        struct full_time_worker_t full_time_wroker,
         struct part_time_worker_t part_time_worker,
         struct contractor_t contractor }
    }
}


bool person_can_do_x(worker_t w)
{
    switch(workers.tag)
    {
    case FULL_TIME_WORKER:
    ...
    case PART_TIMER:
    ...
    case CONTRACTOR:
       ...w.contractor.....
    default:
        return true
    }
}