I'm using PLY to parse sentences like:
"CS 2310 or equivalent experience"
The desired output:
[[("CS", 2310)], ["equivalent experience"]]
YACC tokenizer symbols:
tokens = [
'DEPT_CODE',
'COURSE_NUMBER',
'OR_CONJ',
'MISC_TEX开发者_运维技巧T',
]
t_DEPT_CODE = r'[A-Z]{2,}'
t_COURSE_NUMBER = r'[0-9]{4}'
t_OR_CONJ = r'or'
t_ignore = ' \t'
terms = {'DEPT_CODE': t_DEPT_CODE,
'COURSE_NUMBER': t_COURSE_NUMBER,
'OR_CONJ': t_OR_CONJ}
for name, regex in terms.items():
terms[name] = "^%s$" % regex
def t_MISC_TEXT(t):
r'\S+'
for name, regex in terms.items():
# print "trying to match %s with regex %s" % (t.value, regex)
if re.match(regex, t.value):
t.type = name
return t
return t
(MISC_TEXT is meant to match anything not caught by the other terms.)
Some relevant rules from the parser:
precedence = (
('left', 'MISC_TEXT'),
)
def p_statement_course_data(p):
'statement : course_data'
p[0] = p[1]
def p_course_data(p):
'course_data : course'
p[0] = p[1]
def p_course(p):
'course : DEPT_CODE COURSE_NUMBER'
p[0] = make_course(p[1], int(p[2]))
def p_or_phrase(p):
'or_phrase : statement OR_CONJ statement'
p[0] = [[p[1]], [p[3]]]
def p_misc_text(p):
'''text_aggregate : MISC_TEXT MISC_TEXT
| MISC_TEXT text_aggregate
| text_aggregate MISC_TEXT '''
p[0] = "%s %s" % (p[0], [1])
def p_text_aggregate_statement(p):
'statement : text_aggregate'
p[0] = p[1]
Unfortunately, this fails:
# works as it should
>>> token_list("CS 2110 or equivalent experience")
[LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)]
# fails. bummer.
>>> parser.parse("CS 2110 or equivalent experience")
Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11)
What am I doing wrong? I don't fully understand how to set precedence rules.
Also, this is my error function:
def p_error(p):
print "Syntax error in input: %s" % p
Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?
UPDATE token_list()
is just a helper function:
def token_list(string):
lexer.input(string)
result = []
for tok in lexer:
result.append(tok)
return result
UPDATE 2: Here is the parsing that I want to happen:
Symbol Stack Input Tokens Action
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT Shift DEPT_CODE
DEPT_CODE COURSE_NUMBER OR_CONJ MISC_TEXT MISC_TEXT Shift COURSE_NUMBER
course OR_CONJ MISC_TEXT MISC_TEXT Reduce course : DEPT_CODE COURSE_NUMBER
course_data OR_CONJ MISC_TEXT MISC_TEXT Reduce course_data : course
statement OR_CONJ MISC_TEXT MISC_TEXT Reduce statement : course_data
statement OR_CONJ MISC_TEXT MISC_TEXT Shift OR_CONJ
statement OR_CONJ MISC_TEXT MISC_TEXT Shift MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT Reduce text_aggregate : MISC_TEXT
statement OR_CONJ text_aggregate MISC_TEXT Shift MISC_TEXT
statement OR_CONJ text_aggergate Reduce text_aggregate : text_aggregate MISC_TEXT
statement OR_CONJ statement Reduce statement : TEXT_AGGREGATE
or_phrase Reduce or_phrase : statement OR_CONJ statement
statement Reduce statement : or_phrase
I added this parsing action:
def p_misc_text_singleton(p):
'text_aggregate : MISC_TEXT'
p[0] = p[1]
When I try to build the parser, I get this output:
Generating LALR tables
WARNING: 2 shift/reduce conflicts
WARNING: 3 reduce/reduce conflicts
WARNING: reduce/reduce conflict in state 8 resolved using rule (text_aggregate -> MISC_TEXT MISC_TEXT)
WARNING: rejected rule (text_aggregate -> MISC_TEXT) in state 8
Parsing still fails on a syntax error, as above.
I can't reproduce your error, instead I get a syntax error on "or". You did not include a rule that uses or_phrase
. When I include one, I get no errors.
I don't think it's a precedence issue. It would help if you should set up logging so you can see the steps PLY is taking and compare it to what you want to happen. To do this, pass debug=1
to the parse function (you might also have to pass that to yacc
). Look at PLY's yacc.py
if you can't get the debugging working.
The reduce/reduce conflict happens because it is ambiguous whether it should reduce MISC_TEXT MISC_TEXT
to text_aggregate MISC_TEXT
or if it should reduce MISC_TEXT MISC_TEXT
to text_aggregate
.
Without being able to reproduce the problem, my best guess at what would fix your error is to change the p_misc_text
rule to:
'''text_aggregate : MISC_TEXT
| text_aggregate MISC_TEXT'''
I think you can also delete the precedence
tuple.
精彩评论