开发者

What is holding genetic programming back?

开发者 https://www.devze.com 2023-01-29 02:53 出处:网络
I have done a fair amount of work with genetic algorithms quite successfully and thus far ignored genetic pro开发者_运维技巧gramming.As far as I know, most programs remain written by programmers, and

I have done a fair amount of work with genetic algorithms quite successfully and thus far ignored genetic pro开发者_运维技巧gramming. As far as I know, most programs remain written by programmers, and I'm curious to know what is holding genetic programming back?

Some possible explanations I thought of are:

  1. The search space is just too large to find useful programs among the noise
  2. Most real applications can't supply sufficient data to allow fitness evaluation of such a space.
  3. It is difficult to reduce the efficacy of many real applications down to a single fitness measure. In other words, writing a suitable fitness function would probably entail the same amount of work as writing the actual program.

Any ideas?


This is something I have been considering in my own research, and I'd say there are many reasons:

  1. The vast majority of research in the GP field has focused on producing formulas rather than the sort of software that gets produced by most programmers. There are plenty of computer scientists in the field, but very few are focused on producing the sort of programs you would expect, so advances have been slow in that area.

  2. There is a significant over emphasis on using LISP because it can produce nice tree structures which are easy to manipulate and unfortunatly imperative programs have been neglected because they involve solving some tricky problems. For GP to be taken seriously by programmers it has to produce imperative programs.

  3. Real programs contain looping constructs, but loops are difficult to implement in GP without all sorts of ugly constraints to prevent infinite looping.

  4. Genetic Programming does not scale well. It is fine for small problems, with a small available language, but as you say in your first point - the search space becomes too large very quickly.

  5. Compared to a human programmer, GP can be very slow. It is however very parallelisable so is likely to benefit substantially as larger numbers of processor cores become the norm.

Another valid answer would be that it is difficult to trust code has been automatically generated. This is true, but in practice I doubt this has much impact because GP is not able to produce the right sort of programs in the first place.


The hard part about genetic programming is writing a good scoring function:

  • The scoring function must correctly judge whether the algorithm has the desired properties. This is harder than it sounds -- much harder than writing a test suite. The algorithm may adapt to any quirk of your scoring function and optimize it. Trying to evolve strcmp? Your algorithm may instead discover a mathematical pattern to the lengths of your pass/fail test cases.

  • The scoring function must be capable of judging whether the algorithm under test is partially working. Genetic programming relies on "hill climbing". A tiny beneficial mutation needs to cause a tiny incremental improvement in score. If your scoring function can only output true/false then you're searching randomly, not genetically.

If you try your hand at it you'll find that genetic programming is the ultimate in lateral thinking: Your program will tend to solve the problem in every way you didn't think of, most of them unexpected and (for serious applications) probably useless. One famous case involved an attempt to evolve an oscillator using basic electronic components. It was judged on the simplicity of the circuit and whether the output oscillated. It produced something so simple the researchers were sure it couldn't work, but it did: It was picking up and amplifying radio waves from the environment.

Edit to cite:

Graham-Rowe, Duncan. "Radio emerges from the electronic soup." New Scientist, vol.175, no.2358, p.19 (August 31, 2002). Available online at http://www.newscientist.com/news/news.jsp?id=ns99992732

However the link now appears to be broken.

New link is http://www.newscientist.com/article/dn2732-radio-emerges-from-the-electronic-soup.html


I know I'm late to this party, but I'd just like to make a couple of points. I had the incredible good fortune to work with John Koza on his Genetic Programming 4 book.

The kind of programming that most of us do day to day - hooking up GUI events, pushing pixels, databasing, etc etc are most certainly not the type of programs that GP aims to build.

What John Koza does with his cluster of about a hundred machines (if I remember right!) is to search for solutions to interesting problems (think NP-hard). It's sad but most of the problems us programmers work on day to day are not very interesting or difficult problems, mostly just irritating and time consuming.

What Ben Jackson said about a genetically engineered electronic oscillator is the closest thing in this thread to what Dr. Koza and team actually do. There were many examples of circuits in the GP series of books.

What Tom Castle said here about imperative programs is not exactly true. The seminal example of this from John and company is the antenna design run. It is pretty much a 3d drawing program with commands like the LOGO drawing language that designs an antenna. It has moveto, lineto type commands for pushing and popping matrices on a stack. The GP package I just looked at last week, jgap, has direct support: a container type nonterminal that can contain void return statements and then has a return statement at the end. I believe it even has something like local variables though I didn't look too closely.

The original LISP design that early GP runs centered around is kind of a pain from time to time, that is certainly true. It's something of a rite of passage I think to be annoyed about translating the output of a GP run into more usable code.

TL;DR: GP is not really an automatic programming system. It's an automated invention system.


I'd say 1. and 3.

In particular, concerning point 3, I would say that in most cases it is even easier to write the actual program than to come up with a suitable target function and check that this leads to the correct or an acceptable solution (how do you know that an algorithm derived from genetic programming works as expected ?)

Concerning point 1, I would say that the search space has an infinite number of dimensions.


Three things:

  1. As Andre says, it's very hard to write a fitness function that is sufficient. This is the ultimate version of Test Driven Development. You'd have to write unit tests that provide 100% coverage of an as-yet-unwritten program. Even then, 100% coverage by itself is unlikely to be sufficient. When we've solved the problem of formally specifying the precise behaviour of all aspects of a complex system, and then verifying that a given program satisfies that specification, then we might have a chance.
  2. Genetic Programming is non-deterministic and better suited to generating approximate solutions rather than exact solutions.
  3. Genetic Programming, when applied to any problem of reasonable complexity, is phenomenally computationally expensive. Back in 1999, Genetic Programming Inc was using a 1,000-node cluster for their work in the field.


GP can't quickly solve novel problems that are beyond the knowledge of the person creating the fitness function. So, it can only currently be used to solve problems we already know how to solve, but are not ideal to fully solve, due to search space. Such as Traveling Salesman. Which can be more quickly solved with a GA.

The reason is actually pretty simple. To solve novel problems, the tools you give the GP need to be simple enough, or complete enough, that the GP search space becomes a true analogue to real genetics.

Genetic Programming, like real genetics, is subject to the same growth pattern that all platform-growth systems are. Which means that a GP will progress to a point where the core utilities included hit a platform, it levels off, and then takes a LONG time before it stumbles onto a new paradigm to build up to a new platform.

This pro-evolution video illustrates these platform-growth patterns. http://www.youtube.com/watch?v=mcAq9bmCeR0 It takes a long while to develop a new hand, and once it does, an additional hand becomes the new thing and quickly advances to an optimal example of more hands. (I should mention that this video is most-likely using a GA, not GP, but genetics are genetics)

This is all about the same logic that goes into Singularity Theory, BTW.

But what this means for GP is that it pretty-much is only good for research, not for practical application within a program. With a few simple exceptions where the requirements are slightly more in-depth than a GA can solve. Such as some scheduler programs. Where the programming search space is pretty small, and where the tools needed to solve it are well understood, and where there can be a well-defined fitness function.


It's simply because it has been proven to be theoretically impossible (at least for correct programs).

Let's assume you have an infinite computing power discarding search space size and slowness issues and other 'speed' stuff. Now you face only two problems : - Will the generated program stops ? - How to be sure that the generated program behave the way I want them to ?

The first problem is a central question in computability theory and is called the halting problem . Turing proved in 1936 that this problem is undecidable for all program-input pairs. This means it is possible in some cases, but not for all. There is no automated process that can determine wether a program halts or not. So for this, you can't do much ;)

The second issue related to program correctness. In genetic programming, validation is usually made through example values which does not constitute at all any proof of correctness. This is comparable to unit testing, gives ok for a number of examples, but not no general proof. For example if I write a prime number checker, test it only with 3 5 7 and 11, then a program that returns true for every odd number would pass the test.

Going one step further would be to use automated proofs. Automated proof of correctness for algorithms is in fact deeply related to mathematical automated theorem proving. You describe your program using an axiomatized system and try to automatically proof the correctness of your statement. Here again you face strong theoretical barriers, which are the Gödel Incompleteness theorems. These theorems state among others things that for even very simple axiomatized systems, able to perform arithmetic on natural numbers, there is no algorithm (called effective procedure) able to proof all the theorems regarding these natural numbers. Concretely, this means that even for simple programs, you will not be able to prove its correctness.

Even without a proven correctness, using sample test cases to validate the genetic program is very prone to over-specialization, the phenomenon known as overfitting in machine learning. That is, the learnt program will do fine on the provided test cases but may go totally ballistic for some other inputs.


Maybe that most programmers are programmers, and not computer scientists?

Genetic programming requires serious smarts. You need to be able to break the problem down appropriately, and you need an appropriate problem to start with. And you need to know enough to know that genetic algorithms are an option. And the problem needs to not already have a well known solution.

Not everything needs a fancy algorithm. Of all the code that is written in the world, do 'standard' webapps, OSs, device programming, really need genetic algorithms?

And when it comes down to it, most programming effort is devoted to solving simple problems where a complicated approach is not needed.


I think a big part of the reason is risk management. Bear with me: when a programmer sits down to write a program, they have at least some idea of how long it'll take and of what they can do.

When instead a programmer sits down to write a program that will generate a program (using genetic programming), the uncertainty shoots through the roof: it's unclear how long the process will take, and it's unclear how good the end program can be.

There is also uncertainty in other places: how easy will it be to adjust the program later, or to fix bugs? Generated code can be nigh-impossible to debug.


Primordial soup is suspicious and unappetizing. For my production code I prefer Intelligent Design.


My personal view after couple years in GP research at university and following the field afterwards: GP fails to scale.

Simple problems represented by simple fitness functions GP can solve all right. But as mentioned before - formulating a fitness function that describes the task of evolving strcmp or a calculator or even a simple text editor is almost impossible using classic approaches.

I do like the notion of GP fitness function being TDD in perfection, though :-) Maybe some bright mind will come up with a better way of directing the simulated evolution some day but that has yet to happen..

I have some hopes for GP in the area of implicit fitness functions where I'm currently doing some 'private research'. We'll see how far it'll get!

Cheers, Jay


There are some projects tackling the above mentioned problems in GP. An example would be the opencog MOSES


Nowadays programming is defining the fine specification in a machine readable way. You tell the programm what exactly you want it to do and how the result should look like. Its not much more anymore. That means if you want to generate the result by e.g. genetic programming you still have to do this machine readable fine specification in form of a fitness function. This would lead to at least the same but probably bigger amount of work. So its just the wrong field for genetic programming which is made for problems with an easy to define but a hard to reach specification.

edit: you could now say that in most projects this fine specification is done anyway in form of tests. i would say for a genetic programming approach tests are way to imprecise to specify you're needs. They are example based and not like programming based on a formal specification language. Genetic programming would probably produce results that suit the test cases and behave wrong in real situations with new inputs. So to get a specification level with tests thats as exact as a specification with a programming language you would need a huge amount of test cases for every eventuality. So finally you would end up in doing much more work than programming that stuff.


GP and GA, or in general Evolutionary Algorithms are more like hobbies. They are not used for real-life application, barring exceptions. The reason is that these algorithms are based on randomness. There is no certainty of getting a good answer.

Moreover, this area is flooded with sub-par research work, because its easy to understand and implement compared to other mathematically intense techniques. So students just find an objective to optimize in some engineering or science application and you have a new work - to get published.

Just compare the sponsors for their conferences with those of other AI/ML/Optimization conferences. It will be clear about their "current" importance in industry.

But its an area of research, and its upto us to make it work. At present its just a hobby!

0

精彩评论

暂无评论...
验证码 换一张
取 消