开发者

Using Ocamllex for lexing strings (The Tiger Compiler)

开发者 https://www.devze.com 2023-02-28 16:39 出处:网络
I\'m trying to follow Appel\'s \"Modern Compiler Implementation in ML\" and am writing the lexer using Ocamllex.

I'm trying to follow Appel's "Modern Compiler Implementation in ML" and am writing the lexer using Ocamllex.

The specification asks for the lexer to return strings after translating escape sequences. The following code is an excerpt from the ocamllex input file:

 rule tiger = parse
 ...
 | '"'
     { let buffer = Buffer.create 1 in
       STRING (stringl buffer lexbuf)
     }
 and  stringl buffer = parse
 | '"' { Buffer.contents buffer }
 | "\\t" { Buffer.add_char buffer '\t'; stringl buffer lexbuf }
 | "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
 | "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
 | '\\' '"' { Buffer.add_char buffer '"'; stringl buffer lexbuf }
 | '\\' '\\' { Buffer.add_char buffer '\\'; stringl buffer lexbuf }
 | eof 开发者_运维百科{ raise End_of_file }
 | _ as char { Buffer.add_char buffer char; stringl buffer lexbuf }

Is there a better way?


You may be interested in looking at how the Ocaml lexer does this (search for and string). In essence, it's the same method as yours, without the nice local buffer (I find your code nicer on this point, but this is a bit less efficient), a bit more complex because more escaping is supported, and using an escape table (char_for_backslash) to factorize similar rules.

Also, you have the rule "\\n" repeated twice, and I think 1 is a very pessimistic estimate of your string length, I would rather use 20 here (to avoid needless resizing).

0

精彩评论

暂无评论...
验证码 换一张
取 消