开发者

Which tool to use to parse programming languages in Python?

开发者 https://www.devze.com 2023-03-17 07:49 出处:网络
Which Python tool can you recommend to parse programming languages? It should allow for a readable representation of the language grammar inside the source, and it should be able to scale to complicat

Which Python tool can you recommend to parse programming languages? It should allow for a readable representation of the language grammar inside the source, and it should be able to scale to complicated languages (something with a grammar as complex as e.g. Python itself).

When I search, I mostly find pyparsing, which I will be evaluating, but of course I'm interested in other alternatives.

Edit: Bonus points i开发者_如何学Cf it comes with good error reporting and source code locations attached to syntax tree elements.


I really like pyPEG. Its error reporting isn't very friendly, but it can add source code locations to the AST.

pyPEG doesn't have a separate lexer, which would make parsing Python itself hard (I think CPython recognises indent and dedent in the lexer), but I've used pyPEG to build a parser for subset of C# with surprisingly little work.

An example adapted from fdik.org/pyPEG/: A simple language like this:

function fak(n) {
    if (n==0) { // 0! is 1 by definition
        return 1;
    } else {
        return n * fak(n - 1);
    };
}

A pyPEG parser for that language:

def comment():          return [re.compile(r"//.*"),
                                re.compile("/\*.*?\*/", re.S)]
def literal():          return re.compile(r'\d*\.\d*|\d+|".*?"')
def symbol():           return re.compile(r"\w+")
def operator():         return re.compile(r"\+|\-|\*|\/|\=\=")
def operation():        return symbol, operator, [literal, functioncall]
def expression():       return [literal, operation, functioncall]
def expressionlist():   return expression, -1, (",", expression)
def returnstatement():  return keyword("return"), expression
def ifstatement():      return (keyword("if"), "(", expression, ")", block,
                                keyword("else"), block)
def statement():        return [ifstatement, returnstatement], ";"
def block():            return "{", -2, statement, "}"
def parameterlist():    return "(", symbol, -1, (",", symbol), ")"
def functioncall():     return symbol, "(", expressionlist, ")"
def function():         return keyword("function"), symbol, parameterlist, block
def simpleLanguage():   return function


I would recommend that you check out my library: https://github.com/erezsh/lark

It can parse ALL context-free grammars, automatically builds an AST (with line & column numbers), and accepts the grammar in EBNF format, which is considered the standard.

It can easily parse a language like Python, and it can do so faster than any other parsing library written in Python.


pyPEG (a tool I authored) has a tracing facility for error reporting.

Just set pyPEG.print_trace = True and pyPEG will give you a full trace of what's happening inside.


Antlr is what you should look at http://www.antlr.org

Take a look at this http://www.antlr.org/wiki/display/ANTLR3/Antlr3PythonTarget


For a more complicated parser I would use pyparsing. Pyparsing

Here is the parsed example from there home page

from pyparsing import Word, alphas

greet = Word(alphas) + "," + Word(alphas) + "!"  # <-- grammar 

defined here

hello = "Hello, World!"
print(hello, "->", greet.parseString(hello))


Ned Batchelder did a survey of python parsing tools, which apparently he keeps updated (last updated July 2010):

http://nedbatchelder.com/text/python-parsers.html

If I was going to need a parser today, I would either roll my own recursive descent parser, or possibly use PLY or LEPL -- depending on my needs and whether or not I was willing to introduce an external dependency. I wouldn't personally use PyParsing for anything very complicated.


If you're evaluating PyParsing, I think you should look at funcparserlib: http://pypi.python.org/pypi/funcparserlib

It's a bit similar, but in my experience resulting code is much cleaner.


For simple task I tend to use the shlex module.

See http://wiki.python.org/moin/LanguageParsing for evaluation of language parsing in python.


Antlr generates LL(*) parsers. That can be good, but sometimes removing all left recursion can be cumbersome.

If you are LALR(1)-savvy, you can use PyBison. It has similar syntax to Yacc, if you know what it is. Plus, there are a lot of people out there that know how yacc works.

0

精彩评论

暂无评论...
验证码 换一张
取 消