Does anyone see why the first part of my regex isn't working in Python?_问答_开发者

Does anyone see why the first part of my regex isn't working in Python?

开发者 https://www.devze.com 2023-01-23 04:59 出处：网络

I tested this regex out in RegexBuddy ,[A-Z\\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\\d+/\\d+/\\d{4})?

相关专题：python regex

I tested this regex out in RegexBuddy

,[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?

and it seems to be able to do what I need it to do - capture a piece of data that looks like one of the following:

,POWDER,RO,ML,8/19/2002

,POWDER,RO,,,

,POWDER,RO,,8/19/2002

,POWDER,RO,ML,,

When I use it in a python string:

r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"

It misses the first part of the match, and my resulting matches look like: RO,ML,8/19/2002, or RO,ML, or jusr RO,

The first token is a word that is stored as all caps and may have spaces (and/or possibly punctuation that i need to address as well shortly) in it. if I remove the space it still doesn't capture the one word names that it should. Did I开发者_运维问答 miss something obvious?

Yes. You did not capture the first group.

r",([A-Z\s]+),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
#  ^        ^

BTW, it seems that you are parsing a CSV file with regex. In Python, there is already a csv module.

The first part of your regex doesn't have capturing parentheses around it. Try the regex:

,([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?
 #^^ This was [A-Z\s]+?; needs to be ([A-Z\s]+?)

which would be this in python:

r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"

Example from the interpreter:

>>> import re
>>> r = re.compile(r",[A-Z\s]+?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('RO', 'ML', '8/19/2002')
>>> r = re.compile(r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?")
>>> r.match(",POWDER,RO,ML,8/19/2002").groups()
('POWDER', 'RO', 'ML', '8/19/2002')

I'm not into python, but you just forgot to use brackets to indicate that you want to capture that part:

,([A-Z\s]+)?,(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})? should do what you want

Yes, you missed the grouping parentheses:

>>> s = ",POWDER,RO,ML,8/19/2002"
>>> pat = r",([A-Z\s]+?),(LA|RO|MU|FE|AV|CA),(ML|FE|MN|FS|UN)?,(\d+/\d+/\d{4})?"
>>> re.match(pat, s).groups()
('POWDER', 'RO', 'ML', '8/19/2002')