I've recently become quite curious to the development and execution of programs, and have been trying to inquire more about them. Compilers, in particular, baffle and interest me.
How are compilers developed and created? Wouldn't you need to use another compiler to compile the program you made for your new compiler? What steps are taken to produce a compiler..?
And more than that, how "deeply" do compilers translate the high-level code to? I've read all about the different "generations" and "levels" of programming languages, but what is the given range for the compilers? Do they translate directly into the lowest level of programming, the machine code? Or do they compile into some other language which is again reworked to execute?
But just to clarify, I'm not asking whether constructing a compiler is a reasonable idea, or what I should be trying to compile. I'm merely curious as to how compilers are developed, what steps are taken to construct them, and how deeply they usually compile.
Many thanks for reading through!!
Historically, compilers work off a "Lexer" that performs lexical analysis, breaking the source code into "parts". For example, this sentence is composed of words, and the lexer would identify the words based on whitespace. The lexer has no idea what the words "mean", but is able to understand what comprises a word. For a programming language, lexical analysis takes in source code and outputs a stream of "tokens-and-operators" (or whatever atomic units the language defines as fundamental).
Then, the lexical components (words) are fed into another system to perform semantic analysis. For example, the English language has an understood sentence structure of "NOUN-VERB", like "Jack runs". There are many other logical sentence structures that are acceptable, including use of adverbs, propositional phrases, etc., for the English language or any other language (including source code languages). If the lexical components "match" an accepted pattern in the semantic analysis system, the compiler "understands" what was said (or what the sentence is supposed to "mean").
The final stage is to "translate" the semantic analysis (the "understanding") to executable instructions for the target platform (Windows, Mac, Posix, etc.) So, while the lexing and semantic analysis would be INDEPENDENT of any platform, the final translation to executable instructions is very platform dependent.
Finally, compilers historically perform this "compiling" through those three steps to a platform-specific executable instructions, where the results are fed into a final stage for "linking" all the parts. For example, function calls must "connect to" the definitions for those function calls. That's done by the linker.
I say "historically", because it can be slightly different for interpreted languages that merge the above steps somewhat. Rather than call them "compilers" (which they "kind-of" are, and sometimes actually are), we tend to call them "interpreters". Further, some technologies have assorted hybrids. For example, C++ is one of the most difficult languages to handle (the language itself is very complicated in how to interpret what is being found in the source code), so some approaches "merge" the lexical analysis and semantic analysis to "help determine" the context for what is being said.
This is a big world/domain, but understanding it can be very satisfying and helpful to know how to leverage languages and trouble-shoot problems.
Here's my understanding of simplest form of the steps:
- Parse the source using a grammar and a lexer/parser.
- Create an abstract syntax tree (AST)
- Walk the AST to generate compiled code. This can be assembly instructions for a particular platform or bytecode for a JVM. This is where compile time optimizations, if any, are built in.
精彩评论