I believe the main task is to parse data and create a assembly language instruction set corresponding to it ( both logics ) . Do these compilers use any other inherent C features other than this ? I mean I can write a program that can take my program in language X and make it a C like program and then compile using gcc - everything happening in the backend too - but is that approach sensible ? A graphical representation of my question is :
Language X - Compiler made in C using C's string handling and parsing features to create ASM - 开发者_运维问答RUn on Machine Features : Using C's basic mechanism to generate assembly code nothing more - uses its own assembly logic in the end .
Language X - Compiler made in C again recodes it to C like syntax - Provides it to GCC like compiler - ASM - Machine code
Features : Dumb system as it uses C's facilities in the endYou are hugely mistaken that the two main tasks of a compiler are "writing a parser" and "writing the output to the assembler". Most interesting happen in the middle, verification passes (type checking), analysis passes (various information collection for further optimizations) and transformation passes (from a high-level to a less high-level language, until after some stage you get done to something looking like assembly).
Even if you design a simple compiler (you don't need to compete with GCC the first time), parsers should not be the "main task". Actually parsers are nowadays considered a fairly routine problem, at least if your syntax is rather conventional (I'm not talking about crazy syntax-extensibility things); there are parser generators which work relatively well, and you may also use hand-crafted parsers for more flexibility, but all in all it definitely shouldn't be the problem.
It is perfectly sensible to write a compiler outputting C, or any other language. Lots of different compilers (for example Haskell and various Scheme) have used C as their target language. But usually (for interesting languages anyway) there is a lot of work upfront, to compile the abstractions of the programming language into something more low-level that can be translated to C.
Nowadays there are also other ways to abstract yourself from the low-level assembly part: you may target a virtual machine (JVM, CLR, Erlang's VM, Parrot...), or produce LLVM bytecode, etc.
You mentioned ML in your question. Statically typed functional languages using algebraic datatypes (that is SML, OCaml, Haskell, etc.) are very good languages to write a compiler in; the best suited ones, I would claim. You may be interested in the book Modern Compiler Implementation in ML (there are variants for C and Java, but the ML book is the best one). It's a bit specialized in some places, but it's a probably a good choice to have a good overall view of the compilation techniques. Of course if you want to become a compilation guru you should also use other references such as the Dragon Book, and possibly references for compilation of languages similar to your (I mean compiling a purely functional language can be very different from compiling an imperative procedural language).
Each compiler is different
Compiler writers can (and have!) done just about anything you can think of. The old "translator" f2c was in fact a Fortran compiler that targeted (i.e. produced output in) c.
There is nothing wrong with that, though it can make the compilation process slower (there is an extra lex and parse stage, after all).
Another point, for serious compilers it is the manipulation of the abstract syntax tree to optimize the output that takes most of the code and most of the time. There is a huge difference between the kind of immediate code generation done in the Crenshaw tutorial and a full featured compiler.
精彩评论