开发者

ANSI C unions - are they really useful?

开发者 https://www.devze.com 2023-01-11 09:42 出处:网络
From a response to some question yesterday, I learned that it is nonportable and unsafe to write into one union member and read the value from another member of a different type, assuming underlying a

From a response to some question yesterday, I learned that it is nonportable and unsafe to write into one union member and read the value from another member of a different type, assuming underlying alignment of the me开发者_开发问答mbers. So after some research I found a written source that repeats this claim and specifies a popular example - using union of int and float to find the binary representation of a float.

So, understanding that this assumption is not safe, I wonder - except for saving memory (duh...) what real use is there to unions?

Note: that is, under Standard C. Clearly, for a specific implementation, the rules are known in advance and can be taken advantage of.

EDIT: the word "unsafe", due to association of recent years, is probably a bad choice of wording, but I think the intention in clear.

EDIT 2: Since this point repeats in the answers - saving memory is a valid argument. I wanted to know if there was something beyond that.


Yes.

The provide a way of creating generic containers. Though, to get polymorphic behavior you must implement a vtable or type switching yourself...

There are, however, one of those features that you only use when you need them and need rather rarely.


Even if unions don't offer much in immediate usefulness (reduced memory usage aside), one advantage of using a union over dumping all of its members into a struct is that it makes the intended semantics clear: only one value (or set of values if it's a union of structs) is valid at any given time. It documents itself better.

The mutual exclusivity of members would be less obvious if you instead made all the union members separate members of a struct. Additionally, you'd still have the same problem of ill-defined behavior if you read a member that wasn't previously written to, but now you need to account for the application's semantics too (did it initialize all unused members to 0? did it leave them as garbage?), so in that sense, why wouldn't you use a union?


Yes, unions can be nonportable and unsafe but has its uses. For example, it can speed things up by eliminating the need to cast an uint32 to char[4]. This could come in handy if you are trying to route by IP address in SW, but then your processor endian has to be network order. Think of unions as an alternative to casting, with fewer machine instructions. Casting has similar drawbacks.


The question contains a constraint that might disallow a valid answer...

You ask about real usage under the standard, but "real usage" may be allowing a knowledgeable programmer to exploit implementation defined behaviour in ways that the standards committee didn't want to anticipate or enumerate. And I don't mean that the standards committee had a particular behaviour in mind, but that they explicitly wanted to leave the ability there to be exploited in a useful way.

In other words: Unions don't have to be useful for standard defined behaviour to be useful in general, they could simply there to allow someone to exploit the quirks of their target machine without resorting to assembly.

There could be a million useful ways to use them on the various machines available in implementation-defined ways, and zero useful ways to use them in a strictly portable way, but those million implementation-defined usages are reason enough to standardise their existence.

I hope that makes sense.


Even discounting a specific implementation where the alignment and packing are known, unions can still be useful.

They allow you to store one of many values into a single block of memory, along the lines of:

typedef struct {
    int type;
    union {
        type1 one;
        type2 two;
    }
} unioned_type;

And yes, it is non-portable to expect to be able to store your data into one and read it from two. But if you simply use the type to specify what the underlying variable is, you can easily get at it without having to cast.

In other words:

unioned_type ut;
ut.type = 1;
ut.one = myOne;
// Don't use ut.two here unless you know the underlying details.

is fine assuming you use type to decide that a type1 variable is stored there.


Here is one legitimate portable use of unions:

struct arg {
    enum type t;
    union {
        intmax_t i;
        uintmax_t u;
        long double f;
        void *p;
        void (*fp)(void);
    } v;
};

Coupled with type information in t, struct arg can portably contain any numeric or pointer value. The whole struct is likely to be 16-32 bytes in size, compared to 40-80 bytes if a union had not been used. The difference would be even more extreme if I wanted to keep each possible original numeric type separately (signed char, short, int, long, long long, unsigned char, unsigned short, ...) rather than converting them up to the largest signed/unsigned/floating point type before storing them.

Also, while it is not "portable" to assume anything about the representation of types other than unsigned char, it is permitted by the standard to use a union with unsigned char or cast a pointer to unsigned char * and access arbitrary data object that way. If you write that information to disk, it won't be portable to other systems that used different representations, but it still might be useful at runtime - for example, implementing a hash table to store double values. (Anyone want to correct me if padding bit issues make this technique invalid?) If nothing else, it can be used to implement memcpy (not very useful since the standard library provides you a much better implementation) or (more interestingly) a memswap function which could swap two arbitrary-size objects with bounded temporary space. This has gotten a little outside usage domain of unions now and into unsigned char * cast territory, but it's closely related.


One way to use unions that I came across it do data hiding.

Say you have a struct that is the buffer

then by allowing union on the struct in some modules you can access the contents of the buffer in different ways or not at all depending on the union declared in that particular module.

EDIT: here's an example

struct X
{
  int a;
};

struct Y
{
  int b;
};

union Public
{
   struct X x;
   struct Y y;
};

here whoever uses union XY can cast XY to either struct X or Y

so given a function:

void foo(Public* arg)
{   
...

you can access both struct X or struct Y

but then you want to limit the access so that user doesn't know about X

the union name stays the same but the struct X part is not available (through header)

void foo(Public* arg)
{
   // Public is still available but struct X is gone, 
   // user can only cast to struct Y

   struct Y* p = (struct Y*)arg;
...


Using a union for type punning is non-portable (though not particularly less portable than any other method of type punning).

OTOH, a parser, for one example, typically has a union to represent values in expressions. [Edit: I'm replacing the parser example with one I hope is a bit more understandable]:

Let's consider a Windows resource file. You can use it to define resources like menus, dialogs, icons, etc. Something like this:

#define mn1 2

mn1 MENU
{
    MENUITEM "File", -1, MENUBREAK
}

ico1 "junk.ico"

dlg1 DIALOG 100, 0, 0, 100, 100 
BEGIN
    FONT 14, "Times New Roman"
    CAPTION "Test Dialog Box"
    ICON ico1, 700, 20, 20, 20, 20
    TEXT "This is a string", 100, 0, 0, 100, 10
    LTEXT "This is another string", 200, 0, 10, 100, 10
    RTEXT "Yet a third string", 300, 0, 20, 100, 10
    LISTBOX 400, 20, 20, 100, 100
    CHECKBOX "A combobox", 500, 100, 100, 200, 10
    COMBOBOX 600, 100, 210, 200, 100
    DEFPUSHBUTTON "OK", 75, 200, 200, 50, 15
END

Parsing a the MENU gives a menu-definition; parsing the DIALOG gives a dialog definition and so on. In the parser we represent that as a union:

%union { 
        struct control_def {
                char window_text[256];
                int id;
                char *class;
                int x, y, width, height;
                int ctrl_style;
        } ctrl;

        struct menu_item_def { 
                char text[256];
                int identifier;
        } item;

        struct menu_def { 
                int identiifer;
                struct menu_item_def items[256];
        } mnu;

        struct font_def { 
                int size;
                char filename[256];
        } font;

        struct dialog_def { 
                char caption[256];
                int id;
                int x, y, width, height;
                int style;
                struct menu_def *mnu;
                struct control_def ctrls[256];
                struct font_def font;
        } dlg;

        int value;
        char text[256];
};

Then we specify the type that will be produced by parsing a particular type of expression. For example, a font definition in the file becomes a font member of the union:

%type <font> font

Just to clarify, the <font> part refers to the union member that's produced and the second "font" refers to a parser rule that will yield a result of that type. Here's the rule for this particular case:

font: T_FONT T_NUMBER "," T_STRING { 
    $$.size = $2; 
    strcpy($$.filename,$4); 
};

Yes, in theory we could use a struct instead of a union here -- but beyond wasting memory, it just doesn't make sense. A font definition in the file only defines a font. It would make no sense to have it produce a struct that included a menu definition, icon definition, number, string, etc. in addition to the font it actually defines. [end of edit]

Of course, using unions to save memory is rarely very important anymore. While it may generally seem rather trivial now, back when 64 Kb of RAM was a lot, the memory savings meant a lot more.


Consider a Hardware control Register with different bit fields. By setting values in these bit fields of registers, we can control different functionality of the register.

By using Union Data type, Either We can modify the entire content of the register or a particular bit field of the register.

For Ex: Consider a union data type as follows,

/* Data1 Bit Defintion */
typedef union 
{
    struct STRUCT_REG_DATA
    {
        unsigned int u32_BitField1  : 3;
        unsigned int u32_BitField2  : 2;
        unsigned int u32_BitField3  : 1;
        unsigned int u32_BitField4  : 2;                
    } st_RegData;

    unsigned int u32_RegData;

} UNION_REG_DATA;

To modify the entire Content of the register,

UNION_REG_DATA  un_RegData;
un_RegData. u32_RegData = 0x77;

To modify the single bit field content( For Ex Bitfield3 )

un_RegData.st_RegData.u32_BitField3 = 1;

Both reflect in same memory. Then this value can be written into the value of hardware control register.


Here is a practical example:

There are microcontrollers which their non volatile memories stores data in byte blocks. How could you easily store an array of floats in that memories? We know in C floats are 32bits (4bytes) long, so:

union float_uint8
{
    uint8 i[KNFLOATS*4]; //or KNFLOATS*sizeof(float)
    float f[KNFLOATS];
};

Now you could store/address floats with variables/pointers of type float_uint8 and with a loop, you can easily store them in memory as decomposed bytes without making any conversion or decomposition. And the same story repeats when reading the memory. Even you don't need to know how floats are decomposed in bytes to store or recover the data stored in memory.

This example is extracted from my own work. So yes, they are useful.

0

精彩评论

暂无评论...
验证码 换一张
取 消