I have a function,
P(x0, x1, ..., xn)
that takes 100 integers as input and gives as output an integer. P is a slow function to evaluate(it can range from 30 seconds to a couple of minutes).
I need to know which values of points will maximize the yielded value from P.
What techniques can I use to accomplish this? I know generally people use genetic algorithms for this, but I'm afraid it will take ages to compute it with them, as even with a small population and few generations (let's say, population = 50, generations = 50), P is so slow it will take more than 40 hours to compute it.
Is there any cheaper method of doing it? Maybe an iterative process? I don't need it to be reall开发者_如何转开发y optimal, but I don't have any ideia of how it behaves (I've tried linear / quadratic / exponential but it doesn't seem to yield any good values. I know P can return values at least 5-10 times better than what I'm getting).
It should be something that's easier to implement (i.e., I must implement it myself).
Thanks
edit: P is a stochastic process.
Simulated annealing, closely related to Markov Chain Monte Carlo (MCMC). The variant you probably want is Metropolis-Hastings. When you get the hang of it, it's quite nice. Possibly there are some ways to optimize it because your inputs and result are all integers. It is compute-intensive and may require some tuning, but it is quite robust, and I'm not sure other methods could do any better.
Here's some brain-dead code to do it:
const int n = 100; // length of vector to optimize
int a[n]; // the vector to optimize
double P(a){..} // Get the probability of vector a.
// This is the function to optimize.
// for a large number of a samples
for (i = 0; i < large_number; i++){
// get P(a)
double p = P(a);
// for each element of vector a
for (j = 0; j < n; j++){
// get an amount by which to change it. This choice has to be symmetric.
// this is called the Proposal Distribution
int step = uniform_random_choice_from(-2, -1, 1, 2);
// make the change to a[j], and get p1, the new value of p
a[j] += step;
double p1 = P(a);
bool bKeepTheStep = true;
// if p1 is better than p, keep the step
// if p1 is worse than p, then keep the step p1/p of the time
if (p1 < p){
bKeepTheStep = (unif(0,1) < p1/p);
}
if (bKeepTheStep) p = p1;
else a[j] -= step;
}
// now a is a sample, and p is its value
// record a and p
}
// what you have now is a large random sampling of vectors from distribution P
// now you can choose the best one, the average, the variance,
// any statistic you like
Ways to tweak it are to widen or narrow the proposal distribution, so it takes larger or smaller steps, or you can have it initially take larger steps and then smaller steps. What you're looking for is a percentage of steps that are kept that is neither too high nor too low. You probably want to have a "burn-in" phase of an initial 1k or so samples that you throw away, while it hunts for the area of the mode.
And by all means, profile P. It needs to be as fast as possible. Here's my favorite way to do that.
Maybe a significant portion of your algorithm is parallelizable? If so, have you considered parallelizing your code?
Look at the various stochastic optimization techniques listed here. I recommend simulated annealing.
There are plenty of well-known global optimization algorithms (simulated annealing, stochastic tunneling, etc...) that CAN find the global maximum, but none are guaranteed to find it within a reasonable amount of time without making assumptions about the shape of the function.
You're not going to find a fast/easy way to optimize a 100-dimensional, non-trivial function. You'll need a lot of processing power and time. Assuming you don't want to write optimization code yourself (based on your question), you'll also need some good math software (eg. Mathematica).
Another not fully serious answer, but food for thought:
This problem looks to be so big that by rights you should need something like a SETI@Home effort to solve it. Thousands of computers make reasonably light work of this kind of thing. But I'm not sure how you'd reach thousands of computer users to obtain the use of their computers.
Actually, I do. Please bear with me for a moment in disregarding the legality of it all.
There are botnets run by some folks hiding behind the former Iron Curtain. I recently saw an offer to rent a botnet for $70 for 24 hours. Just think, thousands of 0wned PCs ready to do your bidding! Instead of having them DDOS Internet sites, you could have them churning on your problem. :)
Two final bits of advice on this, though:
- Don't pay them with your own credit card :)
- Don't take legal advice from strangers on SO :)
Good luck!
As a first line algorithms for this type of problem, I would recommend Simulated Annealing. SA is a great first choice because you can clearly control your starting point and run time.
If you know something about the structure of your 100-dimensional space, with SA you can choose a good starting point and that can have a big impact on the quality of your result. Also with SA you can control the 'cooling rate' which impacts both runtime and the quality of your results - naturally in opposite directions. I typically run with a relatively fast cooling rate first to look for good starting vectors, and then slowed the cooling rate in subsequent runs to improve results. Kind of a meta-SA technique that can be automated.
I've used SA successfully to maximize very high dimensional function used in modeling neutron proton interactions in the past.
Also, I would look to dimensionally reduce P() if possible. For your particular problem, are all 100 variables needed? If you can fix 1/2 of those you'll speed up any optimizer and end up with better results.
(And SA is easy to implement.)
Assumptions:
First - the variables must be integer.
Second - the objective function P() is non-linear.
Observation:
In general, non-linear integer programming is very difficult to solve. In reality, as recommended above, rounding a solution by relaxing the integer restriction may help.
There are general unconstrained optimization techniques available. One approach that comes from experimental design is call 'response surface methodology'. Very helpful when the cost of an experiment is significant. The approach is to run a set of experiments by starting with one point and deviating each of your inputs by a set increment. You then calculate the gradient for each input and take a step in that direction for each, then repeat. Fletcher - Practical Methods of Optimization and Box Hunter & Hunter Statistics for Experimenters is place to look.
Neural networks :D or Taylor series ?
If you have access to matlab, you can parallelize your code pretty fast and pretty easily. Even it can make simple linear for loops parallell with its parfor loop
If a Microsoft solution is an option, check out Solver Foundation. I heard about on Scott Hanselman's podcast (#191).
精彩评论