Suppose you have a function, and you call it a lot of times, every time the function return a big object. I've optimized the problem using a functor that return void
, and store the returni开发者_StackOverflow中文版ng value in a public member:
#include <vector>
const int N = 100;
std::vector<double> fun(const std::vector<double> & v, const int n)
{
std::vector<double> output = v;
output[n] *= output[n];
return output;
}
class F
{
public:
F() : output(N) {};
std::vector<double> output;
void operator()(const std::vector<double> & v, const int n)
{
output = v;
output[n] *= n;
}
};
int main()
{
std::vector<double> start(N,10.);
std::vector<double> end(N);
double a;
// first solution
for (unsigned long int i = 0; i != 10000000; ++i)
a = fun(start, 2)[3];
// second solution
F f;
for (unsigned long int i = 0; i != 10000000; ++i)
{
f(start, 2);
a = f.output[3];
}
}
Yes, I can use inline or optimize in an other way this problem, but here I want to stress on this problem: with the functor I declare and construct the output variable output
only one time, using the function I do that every time it is called. The second solution is two time faster than the first with g++ -O1
or g++ -O2
. What do you think about it, is it an ugly optimization?
Edit:
to clarify my aim. I have to evaluate the function >10M times, but I need the output only few random times. It's important that the input is not changed, in fact I declared it as a const reference. In this example the input is always the same, but in real world the input change and it is function of the previous output of the function.
More common scenario is to create object with reserved large enough size outside the function and pass large object to the function by pointer or by reference. You could reuse this object on several calls to your function. Thus you could reduce continual memory allocation.
In both cases you are allocating new vector many many times.
What you should do is to pass both input and output objects to your class/function:
void fun(const std::vector<double> & in, const int n, std::vector<double> & out)
{
out[n] *= in[n];
}
this way you separate your logic from the algorithm. You'll have to create a new std::vector once and pass it to the function as many time as you want. Notice that there's unnecessary no copy/allocation made.
p.s. it's been awhile since I did c++. It may not compile right away.
It's not an ugly optimization. It's actually a fairly decent one.
I would, however, hide output and make an operator[] member to access its members. Why? Because you just might be able to perform a lazy evaluation optimization by moving all the math to that function, thus only doing that math when the client requests that value. Until the user asks for it, why do it if you don't need to?
Edit:
Just checked the standard. Behavior of the assignment operator is based on insert(). Notes for that function state that an allocation occurs if new size exceeds current capacity. Of course this does not seem to explicitly disallow an implementation from reallocating even if otherwise...I'm pretty sure you'll find none that do and I'm sure the standard says something about it somewhere else. Thus you've improved speed by removing allocation calls.
You should still hide the internal vector. You'll have more chance to change implementation if you use encapsulation. You could also return a reference (maybe const) to the vector from the function and retain the original syntax.
I played with this a bit, and came up with the code below. I keep thinking there's a better way to do this, but it's escaping me for now.
The key differences:
- I'm allergic to public member variables, so I made
output
private
, and put getters around it. - Having the operator return
void
isn't necessary for the optimization, so I have it return the value as aconst
reference so we can preserve return value semantics. - I took a stab at generalizing the approach into a templated base class, so you can then define derived classes for a particular return type, and not re-define the plumbing. This assumes the object you want to create takes a one-arg constructor, and the function you want to call takes in one additional argument. I think you'd have to define other templates if this varies.
Enjoy...
#include <vector>
template<typename T, typename ConstructArg, typename FuncArg>
class ReturnT
{
public:
ReturnT(ConstructArg arg): output(arg){}
virtual ~ReturnT() {}
const T& operator()(const T& in, FuncArg arg)
{
output = in;
this->doOp(arg);
return this->getOutput();
}
const T& getOutput() const {return output;}
protected:
T& getOutput() {return output;}
private:
virtual void doOp(FuncArg arg) = 0;
T output;
};
class F : public ReturnT<std::vector<double>, std::size_t, const int>
{
public:
F(std::size_t size) : ReturnT<std::vector<double>, std::size_t, const int>(size) {}
private:
virtual void doOp(const int n)
{
this->getOutput()[n] *= n;
}
};
int main()
{
const int N = 100;
std::vector<double> start(N,10.);
double a;
// second solution
F f(N);
for (unsigned long int i = 0; i != 10000000; ++i)
{
a = f(start, 2)[3];
}
}
It seems quite strange(I mean the need for optimization at all) - I think that a decent compiler should perform return value optimization in such cases. Maybe all you need is to enable it.
精彩评论