开发者

Lexer that recognizes indented blocks [duplicate]

开发者 https://www.devze.com 2023-03-24 04:14 出处:网络
This question already has answers here: How to use indentation as block delimiters with bison and flex
This question already has answers here: How to use indentation as block delimiters with bison and flex (4 answers) Closed 1 year ago.

I want to write a compiler for a language that denotes program blocks with white spaces, like in Python. I prefer to do this in Python, but C++ is also an option. Is there an open-source lexer that can help me do this easily, for example by generating INDENT and DE开发者_JS百科DENT identifiers properly like the Python lexer does? A corresponding parser generator will be a plus.


LEPL is pure Python and supports offside parsing.


If you're using something like lex, you can do it this way:

^[ \t]+              { int new_indent = count_indent(yytext);
                       if (new_indent > current_indent) {
                          current_indent = new_indent;
                          return INDENT;
                       } else if (new_indent < current_indent) {
                          current_indent = new_indent;
                          return DEDENT;
                       }
                       /* Else do nothing, and this way
                          you can essentially treat INDENT and DEDENT
                          as opening and closing braces. */
                     }

You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed.

Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value.

I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. You could use C or C++ with those.

0

精彩评论

暂无评论...
验证码 换一张
取 消