开发者

Adjust Python Regex to not include a single digit in the findall results

开发者 https://www.devze.com 2023-01-13 17:57 出处:网络
I am trying to capture / extract numeric values from some strings. Here is a sample string: s=\'The shipping company had 93,999,888.5685 gallons of fuel on hand\'

I am trying to capture / extract numeric values from some strings.

Here is a sample string:

s='The shipping company had 93,999,888.5685 gallons of fuel on hand'

I want to pull the 93,999,888.5685 value I have gotten my regex to this

> mine=re.compile开发者_开发百科("(\d{1,3}([,\d{3}])*[.\d+]*)")

However, when I do a findall I get the following:

mine.findall(s)

[('93,999,888.5685', '8')]

I have tried a number of different strategies to keep it from matching on the 8

But I am now realizing that I am not sure I know why it matched on the 8

Any illumination would be appreciated.


The reason the 8 is being captured is because you have 2 capturing groups. Mark the 2nd group as a non-capturing group using ?: with this pattern: (\d{1,3}(?:[,\d{3}])*[.\d+]*)

Your second group, ([,\d{3}]) is responsible for the additional match.


Your string broken up:

(
\d{1,3}       This will match any group of 1-3 digits (`8`, `12`, `000`, etc)
  (
     [,\d{3}] This will match groups of a "," and 3 digits (`,123`, `,000`, etc)
  )*            **from zero to infinity times**
  [.\d+]*     This matches any number of periods "." and digits from 0 to infinity
)


findall returns a tuple for each match. The tuple contains each group (delineated by parenthesis in the regex) of the match. You want the first group only. Below I've used a list comprehension to pull out the first group.

>>> mine=re.compile("(\d{1,3}(,\d{3})*(\.?\d+)*)")
>>> s='blah 93,999,888.5685 blah blah blah 988,122.3.'
>>> [m[0] for m in mine.findall(s)]
['93,999,888.5685', '988,122.3']


Why not wrap it in \D ? mine=re.compile("\D(\d{1,3}([,\d{3}])[.\d+])\D").

0

精彩评论

暂无评论...
验证码 换一张
取 消