I've been looking at compiler design. I've done a one semester course on it at University and have been reading Modern Compiler Design by Grune et al, the book seems to advocate an annotated Abstract Syntax Tree as the intermediate code, and this is what we used in the course.
My question is what are the benefits of this approach versus producing some kind of stack-machine language or low level pseudo code , particularly with regard to having a compiler which can target many machin开发者_JS百科es.
Is it a good idea to simply target an already existing low level representation such as LLVM and use that as the intermediate representation?
If your language is complicated enough, you'd end up having a sequence of slightly different intermediate representations any way. And it does not really matter, which representation will be your final target - llvm, C, native code, CLR, JVM, whatever. It should not affect the design and architecture of your compiler.
And, from my personal experience, the more intermediate steps you have, with transforms in between as trivial as possible, the better your compiler's architecture is.
An AST and low-level pseudo-code are two different abstractions of a program in the journey a compiler takes from a high-level language to object code.
As with any complete data representation, you can do everything you need to with either representation. Some things are just easier to do with one than the other.
For example, it's easier to do semantic and syntax analysis on an AST. It's easier to do instruction scheduling on pseudo-code.
Compiler front-ends developers tend to like ASTs. Back end developers tend to like pseudo code.
I haven't heard of an annotated syntax tree in the discussion of compilers so I'm going to go with the same idiom AST (Abstract Syntax Tree).
Normally you can have your parser create an AST which will be, wait for it, an abstract representation of your code. It doesn't contain any spacing, or semantic flavor such as brackets, parens, etc. It also resolves any ambiguity in your code.
An AST will make it very easy to produce icode from it. This icode is basically the instruction code in your language. It will contain rudimentary operations like move, goto, etc.
The process would go Code -> AST -> ICode . The ICode could then be ran through a VM.
I don't see anything wrong with producing ICode that is targeted at another platform.
Update
I reread the question again and I understand what is being talked about now. He is saying instead of creating an icode representation leave leaves at a annotated syntax tree. I'm curious though, if you built your own machine that would process the annotated syntax tree, or was that tree then converted into another well know intermediate code?
I would imagine the engine design for processing a syntax tree would be more complicated than if it was in a intermediate format that represented the basics such as mov, goto, etc.
I'll need to pick this book up. I learned everything from the dragon book and searching through ANTRL, yacc, byson and custom tokenizers and parsers.
精彩评论