Is there a C-like mini-syntax/language that can be both translated to native C/C++ and Java?_问答_开发者

I would like to allow my application to be "scripted". The script language should be typed and C-like. It should have the usual control statements, primitive types, arrays, and operators that can be both found i开发者_开发问答n C and Java. The scripts should be able to define functions, and call pre-defined API functions, like sin() ... that I choose to offer. If the script pass the syntax check, it would then be translated by the application to Java and then compiled on the fly, but it should also be possible to translate it to C/C++, and have it natively compiled. The syntax-check and translation to Java/C should run in the JVM. I don't need an interpreter for it, since it would always be translated to Java/C. Only a syntax-checker and translator.

Is there such a language out there? If not, what is the easiest way of doing this in the JVM, taking into consideration that I'm not knowledgeable in compiler/interpreter programming? (If I was, I would not need to ask this question ...)

If there is a "Scala" solution, it would also be fine, since I'm actually moving my Java code to Scala.

[EDIT] The only reason I want C/C++ translation is performance. I expect a lot of "bit manipulation" over arrays, which Java isn't really suited for, in particular due to range-checking at every array index operation. I also expect many objects, which costs indirectly in GC cycles.

[EDIT#2] OK, I see I have to get concrete. I am considering programming a Minecraft clone, as an exercise, to get my mind off "Business Computing". I'm talking about the engine, not the gameplay. And I'm more interested in the server-side than in the 3D, because I'm a "server guy". We're talking about using the flightweight pattern to represent millions of objects (blocks/voxels), and accessing them all many times per seconds. This isn't what the JVM was made for. Look at this article for an example of what I mean: Why we chose CPP over Java

I don't want to program everything in C/C++, but I suspect this is the only way to get a good performance. Another example of what I mean is VoltDB, which is arguably the fastest SQL database out there. They wrote it in Java and C/C++, using Java for I/O and network, and C for the heavy memory manipulation and bit fumbling. The user-written stored procedures are in Java, but I don't think it needs to be. I think it should be possible to compile to Java on the client and in tests builds, and compile to C on the server, where a full development environment can be configured.

Maybe Haxe will suit your needs. It is an intermediate high-level language that can be compiled into C++ source code. Java targets are in development.

Two words: premature optimization.

You are concerned about performance. But considering that you want to make a Minecraft clone, this means that the game world can very well be represented by a three-dimensional array. Accessing those is reasonably fast in all of the mentioned programming languages; the game logic should take much more time to execute than accessing millions of array entries. So why optimize a part that will not take the majority of the computation time anyway - even before you wrote a minimally working version?

You might want to create a Java interface or a Scala trait that represents the game world. It offers methods to get and store the contents of game world blocks. Later on, you can also add bulk methods to furtherly optimize performance; for example one that will check if all blocks in a given cube are empty, or count the number of wood blocks, something along those lines. But in the beginning, better leave out those methods, or make trivial implementations that rely on calling the abstract methods repeatedly. You can optimize them later.

Then you can provide a very simple Java/Scala implementation of that interface, which actually uses a three-dimensional array. An alternative would be a map whose keys are coordinates, and the values are block states. The advantage would be that there would be no real limit to the size of the game world, and empty blocks would not take up any memory (for coordinates with empty blocks, there is no entry in the map). The disadvantage can obviously be the performance.

At that point, you might want to consider packing the data more tightly, if it consumes too much memory. You can use bit sets. When you reach that stage, it actually makes sense to use JNI to inject some code written in C or C++ into the JVM. So you keep the game logic in Java/Scala, and do the memory packing and lookup in C.

There is no real point in creating a common "script" source that can create a Java/Scala and a C/C++ version of the native part of the code; the native C/C++ functions would rely heavily on optimizations that cannot be directly translated into Java. When you want to start the server in "pure Java/Scala mode", i.e. without the JNI functions, simply use the other classes that you created in the step before. They might be a little slower, but they are pure JVM byte code. And since you kept them simple, there is no danger that you have to extend them wildly or introduce new bugs into them. At least, the overhead of creating or adapting a cross-programming-language code generator is far, far bigger than keeping two separate code bases, especially when the Java/Scala implementation is really simple.

Of course, bit packing only gets you this far. You might want to notice that some parts of the game world are almost completely empty (especially those above the surface), and others contain huge areas filled with blocks of the same types (like underground areas that consist almost solely out of stone). Maintaining a huge memory structure with that much redundancy is really a waste of memory. So you will probably consider packing the game world in a tree, where each node stands for a big cubical area of the game world, and the children subdivide it further, down to leaves that describe the contents of just one specific game world coordinate. When one node has only children of the same content type, you do not need to store the children. Simply cut the tree at this point and have the node say, "You don't need to look further. I am full of water, so every coordinate that is inside of me points to a water tile." - This will greatly reduce the memory usage. Only the parts of the game world that are actually complex will consume a lot of memory, and rightly so. The more boring parts of the world occupy only a few nodes in the tree. This is good. And since it is a tree, traversing it from the top to a leaf takes logarithmic time in average. This is very good! - Of course, you have to keep the tree mutable. If a boring part of the world that is represented by only one node changes, you need to break open that node and split it into two or more children. Should it become simple again afterwards, you can join the children again and cut the tree.

One thing you might notice at this point is: Memory packing and access optimization is not really an issue here any more. A tree like this cannot be reasonably optimized by using native functions for the storage and lookup methods. If you can gain more than, say, 10% out of such an optimization, this would be highly improbable and hugely impressive. (More probably though, this might mean that the Java/Scala counterparts were badly optimized.) Such a minimal speed gain does not justify the huge extra effort that needs to be put into it. Rather put a better CPU into the machine and enjoy the time you saved by eating ice cream, watching Dr. House or continuing to enhance the game further and make it more interesting and attractive for the players. By creating something valuable that will really improve the product.

But this still is not it. If I remember correctly, the initial state of a Minecraft world is procedurally generated. Using fractal algorithms, you can really create endless territories that feel complex and natural in the blink of an eye. So instead of pre-computing the contents of the game world and storing it in a huge datastructure, you might want to use the world generation procedure as a lookup method: Instead of looking up the contents of a coordinate from memory, simply calculate it using the algorithm. In this way, the initial state of the world can be fully stored in four bytes: the seed value of the algorithm.

Of course, the world will not always stay in this state. When the player (or something else) changes the world, then this is something that you need to store. So you store only the world's seed value and the changes made to it. Whenever you look up the contents of a coordinate, try to find it in the changed tile storage. When it's there, use that information. When it's not there, default to the procedural world generation algorithm. This will make your memory consumption decrease enormously. And since the changes to the world are relatively small and contain huge empty areas, it should be relatively easy for you to write a data structure that stores those changes quickly and efficiently. Again, writing native code for this would not yield a significant performance gain and is not worth the effort.

Something else can be optimized though: the procedural world generation algorithm. This is the one key component that you might want to write in C or C++. It should be relatively small and not much code, but it is math intensive and will be called very often. So optimize it good and make a small JNI library out of it. This will give you a huge performance boost that is worth the effort. (Of course, you might want to do a Java/Scala implementation first. If that's already fast enough, then there is no need to get into the JNI trouble.)

If your world generation procedure should still be too slow, then you can implement a cache for it. The cache can even preemptively generate some of the player's surroundings when the JVM has some lazy time.

I laid out this development process for you as an iteration of several ideas, one better than the previous one. Image you would have started writing libraries of optimized C/C++ code already at the first stage. It would have been a waste of time; you could have thrown it all away in the later stages. An efficient array storage that employs bit packing, written in C, is a nice thing, but it is of absolutely no use when you reorganize your world into a binary space partitioned tree.

So, don't overdo it. When you cannot create a minimally working (yet slow and unoptimized) version in Java/Scala alone, then you cannot create an optimized version in C/C++ or some cross-compiling scripting language as well. Do simple versions first, then do performance tests, and only optimize when there is a real need. Don't start off your project by making concepts for optimizations first. Optimizations of this kind should be the last things that you will be working on.

JavaScript syntax is similar to C/C++ and Java. There exist various JavaScript engines, one that is coded in Java is Rhino.

There is also LLVM which is a compiler engine that compiles code to its own bytecode and from there to machine code. It also has a JIT integrated and a number of language frontends exist. I don't know too much about this project, but it looks interesting.

Is there such a language out there? ... What is the easiest way of doing this in the JVM,

I would use plain Java and let the JVM compile the code to native machine code. You can make this compilation more aggresive if you need this.

Perhaps you could clarify what you hope to gain which the JVM doesn't give you already.

If you seriously want to develop your own mini-language you want to have a realist idea of what it will take. If you don't put in the commitment you are likely to come up with something which is not as good.

http://www.ohloh.net/p/openjdk/estimated_cost

OpenJDK: Project Cost Calculator

Include     
Codebase    4,782,885 lines
Effort (est.)   1451 person-years
Estimated Cost  $ 79,805,125

As Scala appears to be an option, just stick with it.

You can invoke the interpreter (as used by the REPL) directly from your own code, it'll compile the script down to Java bytecode which it then runs. You'll be very hard pressed to find a solution that matches this in terms of power and flexibility - especially given the requirement for static typing.

As for performance, the JVM is responsible for then further compiling the bytecode down to native code, it's pretty good at this job too. I doubt you'd see any significant performance boost with C/C++ (the compilation time, in particular, will be much worse).

I've found Groovy to work pretty well as a C-like scripting language on the JVM.

Not sure how well you could translate it to C/C++, but if you have C/C++ code you need to call then you can easily link to it with JNI.

I don't think there is value otherwise in translating JVM based code to C/C++. The JVM JIT is a really good compiler already for the obvious compilation path ( JVM language -> JVM bytecode -> native) and will almost certainly beat the performance of anything that tries to do the much more complex (JVM language -> C/C++ -> native).

What about JavaScript? I does not meet all of your requirements (it's only weakly typed), but it seems to meet quite a few of them: C-like, usual control statements, and operators, arrays, functions, you can offer your own API... It can even be compiled into Java, but modern interpreters are quite good at just-in-time compiling it anyway. If you are writing in Java, I strongly suggest you take a look at Rhino.

This is full C, but possibly useful for your purpose. Cibyl.

After reading all the answers, this is what I have decided to do. To get a head-start on my project, I will not use scripting at the beginning. What I will do instead is program in Java, but without using any problematic constructs.

First I will create an API base class, full of any utility methods I might need. 95% of that code will just delegate to some library that does the work for me. This API class will have factory methods for primitive collection classes. The will also be a few utility classes, like Point, Event, ...

Then, I will create my game code by extending the API object. I will not use access modifiers, so everything is package public. I will not use "final" or "static" or "this" or "super" or "abstract" or "interface" or arrays or general purpose generics or class-cast or "instanceof" or "new" or exception handling. I will not import anything or access that standard Java API, except the String class. Anything I need from the standard Java API will have to be wrapped in my API base class. There are also some restrictions of the operators, in particular bit-wise ones.

To get over the limitation caused by not using generics or class-cast or "instanceof", I will use inversion of control, dependency injection and a bit of reflection to solve the problem.

This should be relatively easy and quick to program, but will not perform. Later, when I my engine has stabilized and been debugged, I can go back to my original idea, and create a parser that will either translate to optimized Java, or C++. Removing all those special Java constructs will make it much easier to build the parser, and since I will have used standard Java syntax, I can just use a predefined parser, and customize it to my needs.