I have strings like this:
"MSE 2110, 3030, 4102"
I would like to output:
[("MSE", 2110), ("MSE", 3030), ("MSE", 4102)]
This is my way of going about it, although I haven't quite gotten it yet:
def makeCourseList(str, location, tokens):
print "before: %s" % tokens
for index, course_number in enumerate(tokens[1:]):
tokens[index + 1] = (tokens[0][0], course_number)
print "after: %s" % tokens
course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course")
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList)
This outputs:
>>> course.parseString("CS 2110")
([(['CS', 2110], {})], {})
>>> course_data.parseString("CS 2110, 4301, 2123, 1110")
before: [['CS', 2110], 4301, 2123, 1110]
after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)]
([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)]开发者_开发百科, {})
Is this the right way to do it, or am I totally off?
Also, the output of isn't quite correct - I want course_data
to emit a list of course
symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}
, whereas the others don't.)
This solution memorizes the department when parsed, and emits a (dept,coursenum) tuple when a number is found.
from pyparsing import Suppress,Word,ZeroOrMore,alphas,nums,delimitedList
data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000
'''
def memorize(t):
memorize.dept = t[0]
def token(t):
return (memorize.dept,int(t[0]))
course = Suppress(Word(alphas).setParseAction(memorize))
number = Word(nums).setParseAction(token)
line = course + delimitedList(number)
lines = ZeroOrMore(line)
print lines.parseString(data)
Output:
[('MSE', 2110), ('MSE', 3030), ('MSE', 4102), ('CSE', 1000), ('CSE', 2000), ('CSE', 3000)]
Is this the right way to do it, or am I totally off?
It's one way to do it, though of course there are others (e.g. use as parse actions two bound method -- so the instance the method belongs to can keep state -- one for the dept code and another for the course number).
The return value of the parseString
call is harder to bend to your will (though I'm sure sufficiently dark magic will do it and I look forward to Paul McGuire explaining how;-), so why not go the bound-method route as in...:
from pyparsing import *
DEPT_CODE = Regex(r'[A-Z]{2,}').setResultsName("DeptCode")
COURSE_NUMBER = Regex(r'[0-9]{4}').setResultsName("CourseNumber")
class MyParse(object):
def __init__(self):
self.result = None
def makeCourseList(self, str, location, tokens):
print "before: %s" % tokens
dept = tokens[0][0]
newtokens = [(dept, tokens[0][1])]
newtokens.extend((dept, tok) for tok in tokens[1:])
print "after: %s" % newtokens
self.result = newtokens
course = Group(DEPT_CODE + COURSE_NUMBER).setResultsName("Course")
inst = MyParse()
course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)
).setParseAction(inst.makeCourseList)
ignore = course_data.parseString("CS 2110, 4301, 2123, 1110")
print inst.result
this emits:
before: [['CS', '2110'], '4301', '2123', '1110']
after: [('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
[('CS', '2110'), ('CS', '4301'), ('CS', '2123'), ('CS', '1110')]
which seems to be what you require, if I read your specs correctly.
data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''
def get_courses(data):
for row in data.splitlines():
department, *numbers = row.replace(",", "").split()
for number in numbers:
yield department, number
This would give a generator for the course codes. A list can be made with list()
if need be, or you can iterate over it directly.
Sure, everybody loves PyParsing
. For easy stuff like this split is sooo much easier to grok:
data = '''\
MSE 2110, 3030, 4102
CSE 1000, 2000, 3000'''
all = []
for row in data.split('\n'):
klass,num_l = row.split(' ',1)
all.extend((klass,int(num)) for num in num_l.split(','))
精彩评论