开发者

Why is comparing against "end()" iterator legal?

开发者 https://www.devze.com 2022-12-28 16:19 出处:网络
According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now t

According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior (in case of doubt also see this question). Now the typical code to traverse an STL containter looks like this:

std::vector<int> toTraverse;
//populate the vector
for( std::vector<int>::iterator it = toTraverse.begin(); it != toTraverse.end(); ++it ) {
    //process( *it );
}

std::vector::end() is an iterator onto the hypothetic element beyond the last element of the containter. There's no element there, therefore using a po开发者_如何学运维inter through that iterator is undefined behavior.

Now how does the != end() work then? I mean in order to do the comparison an iterator needs to be constructed wrapping an invalid address and then that invalid address will have to be used in a comparison which again is undefined behavior. Is such comparison legal and why?


The only requirement for end() is that ++(--end()) == end(). The end() could simply be a special state the iterator is in. There is no reason the end() iterator has to correspond to a pointer of any kind.

Besides, even if it were a pointer, comparing two pointers doesn't require any sort of dereference anyway. Consider the following:

char[5] a = {'a', 'b', 'c', 'd', 'e'};
char* end = a+5;
for (char* it = a; it != a+5; ++it);

That code will work just fine, and it mirrors your vector code.


You're right that an invalid pointer can't be used, but you're wrong that a pointer to an element one past the last element in an array is an invalid pointer - it's valid.

The C standard, section 6.5.6.8 says that it's well defined and valid:

...if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object...

but cannot be dereferenced:

...if the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated...


One past the end is not an invalid value (neither with regular arrays or iterators). You can't dereference it but it can be used for comparisons.

std::vector<X>::iterator it;

This is a singular iterator. You can only assign a valid iterator to it.

std::vector<X>::iterator it = vec.end();

This is a perfectly valid iterator. You can't dereference it but you can use it for comparisons and decrement it (assuming the container has a sufficient size).


Huh? There's no rule that says that iterators need to be implemented using nothing but a pointer.

It could have a boolean flag in there, which gets set when the increment operation sees that it passes the end of the valid data, for instance.


The implementation of a standard library's container's end() iterator is, well, implementation-defined, so the implementation can play tricks it knows the platform to support.
If you implemented your own iterators, you can do whatever you want - so long as it is standard-conform. For example, your iterator, if storing a pointer, could store a NULL pointer to indicate an end iterator. Or it could contain a boolean flag or whatnot.


I answer here since other answers are now out-of-date; nevertheless, they were not quite right to the question.

First, C++14 has changed the rules mentioned in the question. Indirection through an invalid pointer value or passing an invalid pointer value to a deallocation function are still undefined, but other operations are now implemenatation-defined, see Documentation of "invalid pointer value" conversion in C++ implementations.

Second, words matter. You can't bypass the definitions while applying the rules. The key point here is the definition of "invalid". For iterators, this is defined in [iterator.requirements]. Though pointers are iterators, meanings of "invalid" to them are subtly different. Rules for pointers render "invalid" as "don't indirect through invalid value", which is a special case of "not dereferenceable" to iterators; however, "not deferenceable" is not implying "invalid" for iterators. "Invalid" is explicitly defined as "may be singular", while "singular" value is defined as "not associated with any sequence" (in the same paragraph of definition of "dereferenceable"). That paragraph even explicitly defined "past-the-end values".

From the text of the standard in [iterator.requirements], it is clear that:

  • Past-the-end values are not assumed to be dereferenceable (at least by the standard library), as the standard states.
  • Dereferenceable values are not singular, since they are associated with sequence.
  • Past-the-end values are not singular, since they are associated with sequence.
  • An iterator is not invalid if it is definitely not singular (by negation on definition of "invalid iterator"). In other words, if an iterator is associated to a sequence, it is not invalid.

Value of end() is a past-the-end value, which is associated with a sequence before it is invalidated. So it is actually valid by definition. Even with misconception on "invalid" literally, the rules of pointers are not applicable here.

The rules allowing == comparison on such values are in input iterator requirements, which is inherited by some other category of iterators (forward, bidirectional, etc). More specifically, valid iterators are required to be comparable in the domain of the iterator in such way (==). Further, forward iterator requirements specifies the domain is over the underlying sequence. And container requirements specifies the iterator and const_iterator member types in any iterator category meets forward iterator requirements. Thus, == on end() and iterator over same container is required to be well-defined. As a standard container, vector<int> also obey the requirements. That's the whole story.

Third, even when end() is a pointer value (this is likely to happen with optimized implementation of iterator of vector instance), the rules in the question are still not applicable. The reason is mentioned above (and in some other answers): "invalid" is concerned with *(indirect through), not comparison. One-past-end value is explicitly allowed to be compared in specified ways by the standard. Also note ISO C++ is not ISO C, they also subtly mismatches (e.g. for < on pointer values not in the same array, unspecified vs. undefined), though they have similar rules here.


Simple. Iterators aren't (necessarily) pointers.

They have some similarities (i.e. you can dereference them), but that's about it.


Besides what was already said (iterators need not be pointers), I'd like to point out the rule you cite

According to C++ standard (3.7.3.2/4) using (not only dereferencing, but also copying, casting, whatever else) an invalid pointer is undefined behavior

wouldn't apply to end() iterator anyway. Basically, when you have an array, all the pointers to its elements, plus one pointer past-the-end, plus one pointer before the start of the array, are valid. That means:

int arr[5];
int *p=0;
p==arr+4; // OK
p==arr+5; // past-the-end, but OK
p==arr-1; // also OK
p==arr+123456; // not OK, according to your rule
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号