I am writing a compiler for a simple language.
I made a lexer/tokenizer that takes a file and prints the tokens in stdout
.
Now I want to to make the syntactical analysis, but I开发者_运维知识库 don't know how to modify my lexer in order to take the tokens as input.
- A linked list is extremely inefficient for large files (source files around 80MB take about 1.3GB of ram)
- I could modify my lexer to give the next token every time it is called (idea taken from the Dragon Book), but I don't know what I will do if somewhere in the process I have to go back and read a previous token.
What is the right way to do these things?
Implementing a nextToken()
method in the lexical analyser is the standard way. This method is called by the parser (or syntax analyser) until the entire input has been consumed.
but I dont what I will do if somewhere in the process i have to go back and read a previous token
This is not usually the case. But, what the parser may need to do is 'push back' a token (or a number of tokens depending on the lookahead of the parser) which has already been seen. In this case the lexer provides a pushBack(Token)
which ensures that the next call to nextToken()
will return the supplied token, rather than the next token appearing in the input.
but I dont what I will do if somewhere in the process i have to go back and read a previous token
It sounds like your matches are too greedy.
You might look into Backtracking
精彩评论