开发者

Recovering error tokens in parsing (Lemon)

开发者 https://www.devze.com 2023-01-08 12:08 出处:网络
I\'m using Lemon as a parser generator, its error handling is the same as yacc\'s and bison\'s if you don\'t know Lemon.

I'm using Lemon as a parser generator, its error handling is the same as yacc's and bison's if you don't know Lemon.

Lemon has an option to define the error token in a set of rules in order to catch parsing errors. The default behavior of the generated parser is to destroy the token causing the error; is there any way to override this behavior so that I can keep the token?

Here's an e开发者_高级运维xample to show what's happening: basically I'm appending the tokens for each rule together to reform the input string, here's an example grammar:

input ::= string(A) { printf("%s", A); } // Print the result
string(A) ::= string(B) part(C). { A = append(B, C); }
string(A) ::= part(B). { A = B; }
part(A) ::= NUMBER(B) NAME(C). { A = append(C, B); } // Rearrange the number and name
part(A) ::= error(B). { A = B; } // On error keep the token anyways

On input:

"Username 1234Joseph"

I get output:

"Joseph1234"

Because the text "Username " is junked by the parser in the part(A) ::= error(B) rule, but I really want:

"Username Joseph1234"

as output.

If you can solve this problem in bison or another parser generator I would accept that as an answer :)


With yacc/bison, a parsing error drops the tool into error recovery mode, if possible. It will attempt to discard tokens on its way to a "clean" state.

I'm unable to find a reference for lemon, so I can't show some lemon code to fix this, but with yacc/bison, one would use the rules here.

Namely, you need to adjust your error rule to state that the parser is ok with yyerrok to prevent it from dropping tokens. Next, it will attempt to reread the "bad" token, so you need to clear it with yyclearin. Finally, since the rule attached to your error code contains the contents of your token, you will need to set up a function that adjusts your input stack, by taking the current token contents and creating a new (proper) token with the same contents.

As an example, if a grammar defined as MyOther MyOther saw MyTok MyOther:

stack
MyTok: "the text"
MyOther: "new text"

stack
MyOther: "the text"
MyOther: "new text"

To accomplish this, look into using yybackup. I'm unable to find an alternative method, though yybackup is frowned upon.


It's an old one, but why not...

The grammar must include spaces. At the moment the grammar only allows a sequence of NUMBER NAME tokens (without any space between the tokens).

0

精彩评论

暂无评论...
验证码 换一张
取 消