I am trying to capture / extract numeric values from some strings.
Here is a sample string:
s='The shipping company had 93,999,888.5685 gallons of fuel on hand'
I want to pull the 93,999,888.5685 value I have gotten my regex to this
> mine=re.compile开发者_开发百科("(\d{1,3}([,\d{3}])*[.\d+]*)")
However, when I do a findall I get the following:
mine.findall(s)
[('93,999,888.5685', '8')]
I have tried a number of different strategies to keep it from matching on the 8
But I am now realizing that I am not sure I know why it matched on the 8
Any illumination would be appreciated.
The reason the 8 is being captured is because you have 2 capturing groups. Mark the 2nd group as a non-capturing group using ?:
with this pattern: (\d{1,3}(?:[,\d{3}])*[.\d+]*)
Your second group, ([,\d{3}])
is responsible for the additional match.
Your string broken up:
(
\d{1,3} This will match any group of 1-3 digits (`8`, `12`, `000`, etc)
(
[,\d{3}] This will match groups of a "," and 3 digits (`,123`, `,000`, etc)
)* **from zero to infinity times**
[.\d+]* This matches any number of periods "." and digits from 0 to infinity
)
findall
returns a tuple for each match. The tuple contains each group (delineated by parenthesis in the regex) of the match. You want the first group only. Below I've used a list comprehension to pull out the first group.
>>> mine=re.compile("(\d{1,3}(,\d{3})*(\.?\d+)*)")
>>> s='blah 93,999,888.5685 blah blah blah 988,122.3.'
>>> [m[0] for m in mine.findall(s)]
['93,999,888.5685', '988,122.3']
Why not wrap it in \D ? mine=re.compile("\D(\d{1,3}([,\d{3}])[.\d+])\D")
.
精彩评论