开发者

Why is `i = ++i + 1` unspecified behavior?

开发者 https://www.devze.com 2022-12-13 21:34 出处:网络
Consider the following C++ Standard ISO/IEC 14882:2003(E) citation (section 5, paragraph 4): Except where noted, the order of

Consider the following C++ Standard ISO/IEC 14882:2003(E) citation (section 5, paragraph 4):

Except where noted, the order of evaluation of operands of individual operators and subexpressions of individual expressions, and the order in which side effects take place, is unspecified. 53) Between th开发者_如何学Pythone previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined. [Example:

i = v[i++];  // the behavior is unspecified 
i = 7, i++, i++;  //  i becomes 9 

i = ++i + 1;  // the behavior is unspecified 
i = i + 1;  // the value of i is incremented 

—end example]

I was surprised that i = ++i + 1 gives an undefined value of i. Does anybody know of a compiler implementation which does not give 2 for the following case?

int i = 0;
i = ++i + 1;
std::cout << i << std::endl;

The thing is that operator= has two args. First one is always i reference. The order of evaluation does not matter in this case. I do not see any problem except C++ Standard taboo.

Please, do not consider such cases where the order of arguments is important to evaluation. For example, ++i + i is obviously undefined. Please, consider only my case i = ++i + 1.

Why does the C++ Standard prohibit such expressions?


You make the mistake of thinking of operator= as a two-argument function, where the side effects of the arguments must be completely evaluated before the function begins. If that were the case, then the expression i = ++i + 1 would have multiple sequence points, and ++i would be fully evaluated before the assignment began. That's not the case, though. What's being evaluated in the intrinsic assignment operator, not a user-defined operator. There's only one sequence point in that expression.

The result of ++i is evaluated before the assignment (and before the addition operator), but the side effect is not necessarily applied right away. The result of ++i + 1 is always the same as i + 2, so that's the value that gets assigned to i as part of the assignment operator. The result of ++i is always i + 1, so that's what gets assigned to i as part of the increment operator. There is no sequence point to control which value should get assigned first.

Since the code is violating the rule that "between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression," the behavior is undefined. Practically, though, it's likely that either i + 1 or i + 2 will be assigned first, then the other value will be assigned, and finally the program will continue running as usual — no nasal demons or exploding toilets, and no i + 3, either.


It's undefined behaviour, not (just) unspecified behaviour because there are two writes to i without an intervening sequence point. It is this way by definition as far as the standard specifies.

The standard allows compilers to generate code that delays writes back to storage - or from another view point, to resequence the instructions implementing side effects - any way it chooses so long as it complies with the requirements of sequence points.

The issue with this statement expression is that it implies two writes to i without an intervening sequence point:

i = i++ + 1;

One write is for the value of the original value of i "plus one" and the other is for that value "plus one" again. These writes could happen in any order or blow up completely as far as the standard allows. Theoretically this even gives implementations the freedom to perform writebacks in parallel without bothering to check for simultaneous access errors.


C/C++ defines a concept called sequence points, which refer to a point in execution where it's guaranteed that all effects of previous evaluations will have already been performed. Saying i = ++i + 1 is undefined because it increments i and also assigns i to itself, neither of which is a defined sequence point alone. Therefore, it is unspecified which will happen first.


Update for C++11 (09/30/2011)

Stop, this is well defined in C++11. It was undefined only in C++03, but C++11 is more flexible.

int i = 0;
i = ++i + 1;

After that line, i will be 2. The reason for this change was ... because it already works in practice and it would have been more work to make it be undefined than to just leave it defined in the rules of C++11 (actually, that this works now is more of an accident than a deliberate change, so please don't do it in your code!).

Straight from the horse's mouth

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#637


Given two choices: defined or undefined, which choice would you have made?

The authors of the standard had two choices: define the behavior or specify it as undefined.

Given the clearly unwise nature of writing such code in the first place, it doesn't make any sense to specify a result for it. One would want to discourage code like that and not encourage it. It's not useful or necessary for anything.

Furthermore, standards committees do not have any way to force compiler writers to do anything. Had they required a specific behavior it is likely that the requirement would have been ignored.

There are practical reasons as well, but I suspect they were subordinate to the above general consideration. But for the record, any sort of required behavior for this kind of expression and related kinds will restrict the compiler's ability to generate code, to factor out common subexpressions, to move objects between registers and memory, etc. C was already handicapped by weak visibility restrictions. Languages like Fortran long ago realized that aliased parameters and globals were an optimization-killer and I believe they simply prohibited them.

I know you were interested in a specific expression, but the exact nature of any given construct doesn't matter very much. It's not going to be easy to predict what a complex code generator will do and the language attempts to not require those predictions in silly cases.


The important part of the standard is:

its stored value modified at most once by the evaluation of an expression

You modify the value twice, once with the ++ operator, once with the assignment


Please note that your copy of the standard is outdated and contains a known (and fixed) error just in 1st and 3rd code lines of your example, see:

C++ Standard Core Language Issue Table of Contents, Revision 67, #351

and

Andrew Koenig: Sequence point error: unspecified or undefined?

The topic is not easy to get just reading the standard (which is pretty obscure :( in this case).

For example, will it be well(or not)-defined, unspecified or else in general case actually depends not only on the statement structure, but also on memory contents (to be specific, variable values) at the moment of execution, another example:

++i, ++i; //ok

(++i, ++j) + (++i, ++j); //ub, see the first reference below (12.1 - 12.3)

Please have a look at (it has it all clear and precise):

JTC1/SC22/WG14 N926 "Sequence Point Analysis"

Also, Angelika Langer has an article on the topic (though not as clear as the previous one):

"Sequence Points and Expression Evaluation in C++"

There was also a discussion in Russian (though with some apparently erroneous statements in the comments and in the post itself):

"Точки следования (sequence points)"


The following code demonstrates how you could get the wrong(unexpected) result:

int main()
{
  int i = 0;
  __asm { // here standard conformant implementation of i = ++i + 1
    mov eax, i;
    inc eax;
    mov ecx, 1;
    add ecx, eax;
    mov i, ecx;

    mov i, eax; // delayed write
  };
  cout << i << endl;
}

It will print 1 as a result.


Assuming you are asking "Why is the language designed this way?".

You say that i = ++i + i is "obviously undefined" but i = ++i + 1 should leave i with a defined value? Frankly, that would not be very consistent. I prefer to have either everything perfectly defined, or everything consistently unspecified. In C++ I have the latter. It's not a terribly bad choice per se - for one thing, it prevents you from writing evil code which makes five or six modifications in the same "statement".


Argument by analogy: If you think of operators as types of functions, then it kind of makes sense. If you had a class with an overloaded operator=, your assignment statement would be equivalent to something like this:

operator=(i, ++i+1)

(The first parameter is actually passed in implicitly via the this pointer, but this is just for illustration.)

For a plain function call, this is obviously undefined. The value of the first argument depends on when the second argument is evaluated. However with primitive types you get away with it because the original value of i is simply overwritten; its value doesn't matter. But if you were doing some other magic in your own operator=, then the difference could surface.

Simply put: all operators act like functions, and should therefore behave according to the same notions. If i + ++i is undefined, then i = ++i should be undefined as well.


How about, we just all agree to never, never, write code like this? If the compiler doesn't know what you want to do, how do you expect the poor sap that is following on behind you to understand what you wanted to do? Putting i++; on it's own line will not kill you.


The underlying reason is because of the way the compiler handles reading and writing of values. The compiler is allowed to store an intermediate value in memory and only actually commit the value at the end of the expression. We read the expression ++i as "increase i by one and return it", but a compiler might see it as "load the value of i, add one, return it, and the commit it back to memory before someone uses it again. The compiler is encouraged to avoid reading/writing to the actual memory location as much as possible, because that would slow the program down.

In the specific case of i = ++i + 1, it suffers largely due to the need of consistent behavioral rules. Many compilers will do the 'right thing' in such a situation, but what if one of the is was actually a pointer, pointing to i? Without this rule, the compiler would have to be very careful to make sure it performed the loads and stores in the right order. This rule serves to allow for more optimization opportunities.

A similar case is that of the so-called strict-aliasing rule. You can't assign a value (say, an int) through a value of an unrelated type (say, a float) with only a few exceptions. This keeps the compiler from having to worry that some float * being used will change the value of an int, and greatly improves optimization potential.


The problem here is that the standard allows a compiler to completely reorder a statement while it is executing. It is not, however, allowed to reorder statements (so long as any such reordering results in changed program behavior). Therefore, the expression i = ++i + 1; may be evaluated two ways:

++i; // i = 2
i = i + 1;

or

i = i + 1;  // i = 2
++i;

or

i = i + 1;  ++i; //(Running in parallel using, say, an SSE instruction) i = 1

This gets even worse when you have user defined types thrown in the mix, where the ++ operator can have whatever effect on the type the author of the type wants, in which case the order used in evaluation matters significantly.


i = v[i++]; // the behavior is unspecified
i = ++i + 1; // the behavior is unspecified

All the above expressions invoke Undefined Behavior.

i = 7, i++, i++; // i becomes 9

This is fine.

Read Steve Summit's C-FAQs.


From ++i, i must assigned "1", but with i = ++i + 1, it must be assigned the value "2". Since there is no intervening sequence point, the compiler can assume that the same variable is not being written twice, so this two operations can be done in any order. so yes, the compiler would be correct if the final value is 1.

0

精彩评论

暂无评论...
验证码 换一张
取 消