Imagine:
S f(S a) {
return a;
}
Why is it not allowed to alias a
and the return value slot?
S s = f(t);
S s = t; // can't generally transform it to this :(
The spec doesn't allow this transformation if the copy constructor of S
has side effects. Instead, it requires at least two copies (one from t
to a
, and one from a
to the return value, and another from the return value to s
, and only that last one can be elided. Note that I wrote = t
above to represent the fact of a copy of t
to f's a
, the only copy which wo开发者_如何学Pythonuld still be mandatory in the presence of side effects of move/copy constructor).
Why is that?
Here's why copy elision doesn't make sense for parameters. It's really about the implementation of the concept at the compiler level.
Copy elision works by essentially constructing the return value in-place. The value isn't copied out; it's created directly in its intended destination. It's the caller who provides the space for the intended output, and thus it's ultimately the caller who provides the possibility for the elision.
All that the function internally needs to do in order to elide the copy is construct the output in the place provided by the caller. If the function can do this, you get copy elision. If the function can't, then it will use one or more temporary variables to store the intermediate results, then copy/move this into the place provided by the caller. It's still constructed in-place, but the construction of the output happens via copy.
So the world outside of a particular function doesn't have to know or care about whether a function does elision. Specifically, the caller of the function doesn't have to know about how the function is implemented. It's not doing anything different; it's the function itself that decides if elision is possible.
Storage for value parameters is also provided by the caller. When you call f(t)
, it is the caller that creates the copy of t
and passes it to f
. Similarly, if S
is implicitly constructable from an int
, then f(5)
will construct an S
from the 5 and pass it to f
.
This is all done by the caller. The callee doesn't know or care that it was a variable or a temporary; it's just given a spot of stack memory (or registers or whatever).
Now remember: copy elision works because the function being called constructs the variable directly into the output location. So if you're trying to elide the return from a value parameter, then the storage for the value parameter must also be the output storage itself. But remember: it is the caller that provides that storage for both the parameter and the output. And therefore, to elide the output copy, the caller must construct the parameter directly into the output.
To do this, now the caller needs to know that the function it's calling will elide the return value, because it can only stick the parameter directly into the output if the parameter will be returned. That's not going to generally be possible at the compiler level, because the caller doesn't necessarily have the implementation of the function. If the function is inlined, then maybe it can work. But otherwise no.
Therefore, the C++ committee didn't bother to allow for the possibility.
The rationale, as I understand it, for that restriction is that the calling convention might (and will in many cases) demand that the argument to the function and the return object are at different locations (either memory or registers). Consider the following modified example:
X foo();
X bar( X a )
{
return a;
}
int main() {
X x = bar( foo() );
}
In theory the whole set of copies would be return statement in foo
($tmp1
), argument a
of bar
, return statement of bar
($tmp2
) and x
in main
. Compilers can elide two of the four objects by creating $tmp1
at the location of a
and $tmp2
at the location of x
. When the compiler is processing main
it can note that the return value of foo
is the argument to bar
and can make them coincide, at that point it cannot possibly know (without inlining) that the argument and return of bar
are the same object, and it has to comply with the calling convention, so it will place $tmp1
in the position of the argument to bar
.
At the same time, it knows that the purpose of $tmp2
is only creating x
, so it can place both at the same address. Inside bar
, there is not much that can be done: the argument a
is located in place of the first argument, according to the calling convention, and $tmp2
has to be located according to the calling convention, (in the general case in a different location, think that the example can be extended to a bar
that takes more arguments, only one of which is used as return statement.
Now, if the compiler performs inlining it could detect that the extra copy that would be required if the function was not inlined is really not needed, and it would have a chance for eliding it. If the standard would allow for that particular copy to be elided, then the same code would have different behaviors depending on whether the function is inlined or not.
David Rodríguez - dribeas answer to my question 'How to allow copy elision construction for C++ classes' gave me the following idea. The trick is to use lambdas to delay evaluation til inside the function body:
#include <iostream>
struct S
{
S() {}
S(const S&) { std::cout << "Copy" << std::endl; }
S(S&&) { std::cout << "Move" << std::endl; }
};
S f1(S a) {
return a;
}
S f2(const S& a) {
return a;
}
#define DELAY(x) [&]{ return x; }
template <class F>
S f3(const F& a) {
return a();
}
int main()
{
S t;
std::cout << "Without delay:" << std::endl;
S s1 = f1(t);
std::cout << "With delay:" << std::endl;
S s2 = f3(DELAY(t));
std::cout << "Without delay pass by ref:" << std::endl;
S s3 = f2(t);
std::cout << "Without delay pass by ref (temporary) (should have 0 copies, will get 1):" << std::endl;
S s4 = f2(S());
std::cout << "With delay (temporary) (no copies, best):" << std::endl;
S s5 = f3(DELAY(S()));
}
This outputs on ideone GCC 4.5.1:
Without delay:
Copy
Copy
With delay:
Copy
Now this is good, but one could suggest that the DELAY version is just like passing by const reference, as below:
Without delay pass by ref:
Copy
But if we pass a temporary by const reference, we still get a copy:
Without delay pass by ref (temporary) (should have 0 copies, will get 1):
Copy
Where the delayed version elides the copy:
With delay (temporary) (no copies, best):
As you can see, this elides all copies in the temporary case.
The delayed version produces one copy in the non-temporary case, and no copies in the case of a temporary. I don't know any way to achieve this other than lambdas, but I'd be interested if there is.
From t to a it is unreasonable to elide copy. The parameter is declared mutable, so copying is done because it is expected to be modified in function.
From a to return value i can not see any reasons to copy. Perhaps it is some sort of oversight? The by-value parameters feel like locals inside function body ... i see no difference there.
I feel, because the alternative is always available for the optimization:
S& f(S& a) { return a; } // pass & return by reference
^^^ ^^^
If f()
is coded as mentioned in your example, then it's perfectly alright to assume that copy is intended or side effects are expected; otherwise why not to choose the pass/return by reference ?
Suppose if NRVO applies (as you ask) then there is no difference between S f(S)
and S& f(S&)
!
NRVO kicks in the situations like operator +()
(example) because there is no worthy alternative.
One supporting aspect, all below function have different behaviors for copying:
S& f(S& a) { return a; } // 0 copy
S f(S& a) { return a; } // 1 copy
S f(S a) { A a1; return (...)? a : a1; } // 2 copies
In the 3rd snippet, if the (...)
is known at compile time to be false
then compiler generates only 1 copy.
This means, that compiler purposefully doesn't perform optimization when a trivial alternative is available.
I think the issue is that if the copy constructor does something, then the compiler must do that thing a predictable number of times. If you have a class that increments a counter every time it's copied, for example, and there's a way to access that counter, then a standards-compliant compiler must do that operation a well-defined number of times (otherwise, how would one write unit tests?)
Now, it's probably a bad idea to actually write a class like that, but it's not the compiler's job to figure that out, only to make sure that the output is correct and consistent.
精彩评论