Why are the results of integer promotion different?_问答_开发者

Please look at my test code:

#include <stdlib.h>
#include <stdio.h>


#define PRINT_COMPARE_RESULT(a, b) \
    if (a > b) { \
        printf( #a " > " #b "\n"); \
    } \
    else if (a < b) { \
        printf( #a " < " #b "\n"); \
    } \
    else { \
        printf( #a " = " #b "\n" ); \
    }

int main()
{
    signed   int a = -1;
    unsigned int b = 2;
    signed   short c = -1;
    unsigned short d = 2;

    PRINT_COMPARE_RESULT(a,b);
    PRINT_COMPARE_RESULT(c,d);

    return 0;
}

The result is the following:

a > b
c < d

My platform is Linux, and my gcc version is 4.4.2. I am surprised by the second line of output. The first line of output is caused by integer promotion. But why is the result of the second line different?

The following rules are from C99 standard:

If both operands have the same type, then no further conversion is needed. Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.

Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to th开发者_如何学Ce type of the operand with unsigned integer type.

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

I think both of the two comparisons should belong to the same case, the second case of integer promotion.

When you use an arithmetic operator, the operands go through two conversions.

Integer promotions: If int can represent all values of the type, then the operand is promoted to int. This applies to both short and unsigned short on most platforms. The conversion performed on this stage is done on each operand individually, without regard for the other operand. (There are more rules, but this is the one that applies.)

Usual arithmetic conversions: If you compare an unsigned int against a signed int, since neither includes the entire range of the other, and both have the same rank, then both are converted to the unsigned type. This conversion is done after examining the type of both operands.

Obviously, the "usual arithmetic conversions" don't always apply, if there are not two operands. This is why there are two sets of rules. One gotcha, for example, is that shift operators << and >> don't do usual arithmetic conversions, since the type of the result should only depend on the left operand (so if you see someone type x << 5U, then the U stands for "unnecessary").

Breakdown: Let's assume a typical system with 32-bit int and 16-bit short.

int a = -1;         // "signed" is implied
unsigned b = 2;     // "int" is implied
if (a < b)
    puts("a < b");  // not printed
else
    puts("a >= b"); // printed

First the two operands are promoted. Since both are int or unsigned int, no promotions are done.
Next, the two operands are converted to the same type. Since int can't represent all possible values of unsigned, and unsigned can't represent all possible values of int, there is no obvious choice. In this case, both are converted to unsigned.
When converting from signed to unsigned, 2³² is repeatedly added to the signed value until it is in the range of the unsigned value. This is actually a noop as far as the processor is concerned.
So the comparison becomes if (4294967295u < 2u), which is false.

Now let's try it with short:

short c = -1;          // "signed" is implied
unsigned short d = 2;
if (c < d)
    puts("c < d");     // printed
else
    puts("c >= d");    // not printed

First, the two operands are promoted. Since both can be represented faithfully by int, both are promoted to int.
Next, they are converted to the same type. But they already are the same type, int, so nothing is done.
So the comparison becomes if (-1 < 2), which is true.

Writing good code: There's an easy way to catch these "gotchas" in your code. Just always compile with warnings turned on, and fix the warnings. I tend to write code like this:

int x = ...;
unsigned y = ...;
if (x < 0 || (unsigned) x < y)
    ...;

You have to watch out that any code you do write doesn't run into the other signed vs. unsigned gotcha: signed overflow. For example, the following code:

int x = ..., y = ...;
if (x + 100 < y + 100)
    ...;
unsigned a = ..., b = ...;
if (a + 100 < b + 100)
    ...;

Some popular compilers will optimize (x + 100 < y + 100) to (x < y), but that is a story for another day. Just don't overflow your signed numbers.

Footnote: Note that while signed is implied for int, short, long, and long long, it is NOT implied for char. Instead, it depends on the platform.

Taken from the C++ standard:

4.5 Integral promotions [conv.prom]
1 An rvalue of type char, signed char, unsigned char, short int, or unsigned short int can be converted to an rvalue of type int if int can represent all the values of the source type; otherwise, the source rvalue can be converted to an rvalue of type unsigned int.

In practice it means, that all operations (on the types in the list) are actually evaluated on the type int if it can cover the whole value set you are dealing with, otherwise it is carried out on unsigned int. In the first case the values are compared as unsigned int because one of them was unsigned int and this is why -1 is "greater" than 2. In the second case the values a compared as signed integers, as int covers the whole domain of both short and unsigned short and so -1 is smaller than 2.

(Background story: Actually, all this complex definition about covering all the cases in this way is resulting that the compilers can actually ignore the actual type behind (!) :) and just care about the data size.)

The conversion process for C++ is described as the usual arithmetic conversions. However, I think the most relevant rule is at the sub-referenced section conv.prom: Integral promotions 4.6.1:

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank ([conv.rank]) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

The funny thing there is the use of the word "can", which I think suggests that this promotion is performed at the discretion of the compiler.

I also found this C-spec snippet that hints at the omission of promotion:

11   EXAMPLE 2       In executing the fragment
              char c1, c2;
              /* ... */
              c1 = c1 + c2;
     the ``integer promotions'' require that the abstract machine promote the value of each variable to int size
     and then add the two ints and truncate the sum. Provided the addition of two chars can be done without
     overflow, or with overflow wrapping silently to produce the correct result, the actual execution need only
     produce the same result, possibly omitting the promotions.

There is also the definition of "rank" to be considered. The list of rules is pretty long, but as it applies to this question "rank" is straightforward: