A language in which everything compiles_问答_开发者

I'm trying to 开发者_运维知识库do some research for a new project, and I need to create objects dynamically from random data. For this to work, I need a language / compiler that doesn't have problems with weird uncompilable code lying around.

Basically, I need the random code to compile (or be interpreted) as much as possible - Meaning that the uncompilable parts will be ignored, and only the compilable parts will create the objects (which could be ran).

Object Oriented-ness is not a must, but is a very strong advantage.

I thought of ASM, but it's very messy, and I'd probably need a more readable code

Thanks!

It sounds like you're doing something very much like genetic programming; even if you aren't, GP has to solve some of the same problems—using randomness to generate valid programs. The approach to this that is typically used is to work with a syntax tree: rather than storing x + y * 3 - 2, you store something like the following:

A language in which everything compiles

Then, instead of randomly changing the syntax, one can randomly change nodes in the tree instead. And if x should randomly change to, say, +, you can statically know that this means you need to insert two children (or not, depending on how you define +).

A good choice for a language to work with for this would be any Lisp dialect. In a Lisp, the above program would be written (- (+ x (* y 3)) 2), which is just a linearization of the syntax tree using parentheses to show depth. And in fact, Lisps expose this feature: you can just as easily work with the object '(- (+ x (* y 3)) 2) (note the leading quote). This is a three-element list, whose first element is -, second element is another list, and third element is 2. And, though you might or might not want it for your particular application, there's an eval function, such that (eval '(- (+ x (* y 3)) 2)) will take in the given list, treat it as a Lisp syntax tree/program, and evaluate it. This is what makes Lisps so attractive for doing this sort of work; Lisp syntax is basically a reification of the syntax-tree, and if you operate at the syntax-tree level, you can work on code as though it was a value. Lisp won't help you read /dev/random as a program directly, but with a little interpretation layered on top, you should be able to get what you want.

I should also mention, though I don't know anything about it (not that I know much about ordinary genetic programming) the existence of linear genetic programming. This is sort of like the assembly model that you mentioned—a linear stream of very, very simple instructions. The advantage here would seem to be that if you are working with /dev/random or something like it, the amount of interpretation needed is very small; the disadvantage would be, as you mentioned, the low-level nature of the code.

I'm not sure if this is what you're looking for, but any programming language can be made to function this way. For any programming language P, define the language P_always as follows:

If p is a valid program in P, then p is a valid program in P_always whose meaning is the same as its meaning in P.
If p is not a valid program in P, then p is a valid program in P_always whose meaning is the same as a program that immediately terminates.

For example, I could make the language C++_always so that this program:

#include <iostream>
using namespace std;

int main() {
    cout << "Hello, world!" << endl;
}

would compile as "Hello, world!", while this program:

Hahaha!  This isn't legal C++ code!

Would be a legal program that just does absolutely nothing.

To solve your original problem, just take any OOP language like Java, Smalltalk, etc. and construct the appropriate Java_always, Smalltalk_always, etc. language from it. Again, I'm not sure if this is at all what you're looking for, but it could be done very easily.

Alternatively, consider finding a grammar for any OOP language and then using that grammar to produce random syntactically valid programs. You could then filter those programs down by using the P_always programming language for that language to eliminate syntactically but not semantically valid programs.

Divide the ASCII byte values into 9 classes (division modulo 9 would help). Then assign then to Brainfuck codewords (see http://en.wikipedia.org/wiki/Brainfuck). Then interpret as Brainfuck.

There you go, any sequence of ASCII characters is a program. Not that it's going to do anything sensible... This approach has a much better chance, compared to templatetypedef's answer, to get a nontrivial program from a random byte sequence.

Text Editors

You could try feeding random character strings to an editor like Emacs or VI. Many (most?) characters will perform an editing action but some will do nothing (other than beep, perhaps). You would have to ensure that the random code mutator never generates the character sequence that exits the editor. However, this experience would be much like programming a Turing machine -- the code is not too readable.

Mathematica

In Mathematica, undefined symbols and other expressions evaluate to themselves, without error. So, that language might be a viable choice if you can arrange for the random code mutator to always generate well-formed expressions. This would be readily achievable since the basic Mathematica syntax is trivial, making it easy to operate on syntactic units rather than at the character level. It would be even easier if the mutator were written in Mathematica itself since expression-munging is Mathematica's forte. You could define a mini-language of valid operations within a Mathematica package that does not import the system-defined symbols. This would allow you to generate well-formed expressions to your heart's content without fear of generating a dangerous expression, like DeleteFile[FileNames["*.*", "/", Infinity]].

I believe Common Lisp should suit your needs. I always have some code in my SLIME/Emacs session that wouldn't compile. You can always tweak things, redefine functions in run-time. It is actually very good for prototyping.

A few years ago it took me quite a while to learn. But nowadays we have quicklisp and everything is so much easier.

Here I describe my development environment: Install lisp on my linux machine

PS: I want to give an example, where Common Lisp was useful for me: Up to maybe 2004 I used to write small programs in C (the keep it simple Unix way).

The last 3 years I had to get lots of different hardware running. Motorized stages, scientific cameras, IO cards.

The cameras turned out to be quite annoying. Usually you have to cool them down to -50 degree celsius or so and (in some SDKs) they don't like it when you close them. But this is exactly how my C development cycle worked: write (30s), compile (1s), run (0.1s), repeat.

Eventually I decided to just use Common Lisp. Often it is straight forward to define the foreign function interfaces to talk to the SDKs and I can do this without ever leaving the running Lisp image. I start the editor in the morning define the open-device function, to talk to the device and after 3 hours I have enough of the functions implemented to set gain, temperature, region of interest and obtain the video.

Then I can often put the SDK manual away and just use the camera.

I used the same interactive programming approach when I have to parse some webpage or some weird XML.