开发者

python re: r'\b \$ \d+ \b' won't match 'aug 12, 2010 abc $123'

开发者 https://www.devze.com 2023-01-16 16:12 出处:网络
so i\'m just making a script to collect $ values from a transaction log type file for line in sys.stdin:

so i'm just making a script to collect $ values from a transaction log type file

for line in sys.stdin:
    match = re.match( r'\b \$ (\d+) \b', line)
    if match is not None:
            for value in match.groups():
                    prin开发者_如何学Ct value

right now I'm just trying to print those values it would match a line containing $12323 but not when there are other things in the line From what I read it should work, but looks like I could be missing something


re.match:

If zero or more characters at the beginning of string match this regular expression, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.

What your are looking for is either re.search or re.findall:

#!/usr/bin/env python

import re
s = 'aug 12, 2010 abc $123'

print re.findall(r'\$(\d+)', s)
# => ['123']

print re.search(r'\$(\d+)', s).group()
# => $123

print re.search(r'\$(\d+)', s).group(1)
# => 123


By having a space between \$ and (\d+), the regex expects a space in your string between them. Is there such a space?


I am not so clear what is accepted for you but from statement

a line containing $12323 but not when there are other things in the line

I would get that

'aug 12, 2010 abc $123'

Is not supposed to match as it has other text befor the amount.

If you want to match amount at end of the line here is the customary anti-regexp answer (even I am not against of using them in easy cases):

loglines = ['aug 12, 2010 abc $123', " $1 ", "a $1 amount", "exactly $1 - no less"]

# match $amount at end of line without other text after
for line in loglines:
    if '$' in line:
        _,_, amount = line.rpartition('$')
        try:
            amount = float(amount)
        except:
            pass
        else:
            print "$%.2f" % amount


Others have already pointed out some shortcomings of your regex (especially the mandatory spaces and re.match vs. re.search).

There is another thing, though: \b word anchors match between alphanumeric and non-alphanumeric characters. In other words, \b \$ will fail (even when doing a search instead of a match operation) unless the string has some alphanumeric characters before the space.

Example (admittedly contrived) to work with your regex:

>>> import re
>>> test = [" $1 ", "a $1 amount", "exactly $1 - no less"]
>>> for string in test:
...     print(re.search(r"\b \$\d+ \b", string))
...
None
<_sre.SRE_Match object at 0x0000000001DD4370>
None
0

精彩评论

暂无评论...
验证码 换一张
取 消