开发者

Excel-like toy-formula parsing

开发者 https://www.devze.com 2023-03-11 10:05 出处:网络
I would like to create a grammar for parsing a toy like formula language that resembles S-expression syntax.

I would like to create a grammar for parsing a toy like formula language that resembles S-expression syntax.

I read through the "Getting Started with PyParsing" book and it included a very nice section that sort of covers a similar grammar.

Two examples of data to parse are:

sum(5,10,avg(15,20))+10
stdev(5,10)*2开发者_StackOverflow中文版

Now, I have come up with a grammar that sort-of parses the formula but disregards expanding the functions and operator precedence.

What would be the best practice to continue on with it: Should I add parseActions for words that match oneOf the function names ( sum, avg ... ). If I build a nested list, I could do a depth-first walking of parse results and evaluate the functions ?


It's a little difficult to advise without seeing more of your code. Still, from what you describe, it sounds like you are mostly tokenizing, to recognize the various bits of punctuation and distinguishing variable names from numeric constants from algebraic operators. nestedExpr will impart some structure, but only basic parenthetical nesting - this still leaves operator precedence handling for your post-parsing work.

If you are learning about parsing infix notation, there is a succession of pyparsing examples to look through and study (at the pyparsing wiki Examples page). Start with fourFn.py, which is actually a five function infix notation parser. Look through its BNF() method, and get an understanding of how the recursive definitions work (don't worry about the pushFirst parse actions just yet). By structuring the parser this way, operator precedence gets built right into the parsed results. If you parse 4 + 2 * 3, a mere tokenizer just gives you ['4','+','2','*','3'], and then you have to figure out how to do the 2*3 before adding the 4 to get 10, and not just brute force add 4 and 2, then multiply by 3 (which gives the wrong answer of 18). The parser in fourFn.py will give you ['4','+',['2','*','3']], which is enough structure for you to know to evaluate the 2*3 part before adding it to 4.

This whole concept of parsing infix notation with precedence of operations is so common, I wrote a helper function that does most of the hard work, called operatorPrecedence. You can see how this works in the example simpleArith.py and then move on to eval_arith.py to see the extensions need to create an evaluator of the parsed structure. simpleBool.py is another good example showing precedence for logical terms AND'ed and OR'ed together.

Finally, since you are doing something Excel-like, take a look at excelExpr.py. It tries to handle some of the crazy corner cases you get when trying to evaluate Excel cell references, including references to other sheets and other workbooks.

Good luck!

0

精彩评论

暂无评论...
验证码 换一张
取 消