I'm trying to lex (then parse) a C like language. In C there are preprocessor directives where line breaks are significant, then the actual code where they are just whitespace.
One way of doing this would be do a two pass process like early C compilers - have a separate preprocessor for the # directives, then lex the output of that.
However, I wondered if it was possible to do it in a single lexer. I'm pretty happy with writing the scala parser-combinator code, but I'm not so sure of how StdLex开发者_如何转开发ical
handles whitespace.
Could someone write some simple sample code which say could lex a #include
line (using the newline) and some trivial code (ignoring the newline)? Or is this not possible, and it is better to go with the 2-pass appproach?
OK, solved this myself, answer here for posterity.
In StdLexical you already have the ability to specify whitespace in your lexer. All you have to do is override your token method appropriately. Here is some sample code (with non relevant bits removed)
override def token: CeeLexer.Parser[Token] = controlLine
// | ... (where ... is whatever you want to keep of the original method)
def controlLine = hashInclude
def hashInclude : CeeLexer.Parser[HashInclude] =
('#' ~ word("include") ~ rep(nonEolws)~'\"' ~ rep(chrExcept('\"', '\n', EofCh)) ~ '\"' ~ '\n' |
'#' ~ word("include") ~ rep(nonEolws)~'<' ~ rep(chrExcept('>', '\n', EofCh)) ~ '>' ~ '\n' ) ^^ {
case hash~include~whs~openQ~fname~closeQ~eol => // code to handle #include
}
精彩评论