开发者

Where exactly does C++ standard say dereferencing an uninitialized pointer is undefined behavior?

开发者 https://www.devze.com 2023-01-26 21:32 出处:网络
So far I can\'t find how to deduce that the following: int* ptr; *ptr = 0; is undefined behavior. First of all, there\'s 5.3.1/1 that states that * means indirection which converts T* to T. But th

So far I can't find how to deduce that the following:

int* ptr;
*ptr = 0;

is undefined behavior.

First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.

Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above th开发者_如何学JAVAere's nothing about deallocation.

How can UB be deduced in the code above?


Section 4.1 looks like a candidate (emphasis mine):

An lvalue (3.10) of a non-function, non-array type T can be converted to an rvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the lvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior. If T is a non-class type, the type of the rvalue is the cv-unqualified version of T. Otherwise, the type of the rvalue is T.

I'm sure just searching on "uninitial" in the spec can find you more candidates.


I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):

[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.

and:

An invalid iterator is an iterator that may be singular.268

and footnote 268 says:

This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.

Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.

The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:

Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.

So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.

Update

An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]

and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):

An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]

but ptr will not, except by chance, point to a valid object.


The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.

The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.

In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.


Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.

This was true in both C++11 and C++14, although the wording changed.

In C++14 it is fully covered by [dcl.init]/12:

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

where the "following cases" are particular operations on unsigned char.


In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].


I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.

Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.

A note in section 8.3.2 [dcl.ref]

[Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bitfield. ]

—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]

I think I should have written this as comment instead, I'm not really that sure.


To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.

What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.


Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:

uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
  uint16_t a;
  if (q & 1) a=foo();
  if (q & 2) a=bar();
  return a;
}
unsigned short test(void)
{
  return blah(65540);
}

it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.

0

精彩评论

暂无评论...
验证码 换一张
取 消