开发者

how to extract data(string) using php using regex?

开发者 https://www.devze.com 2023-04-11 14:35 出处:网络
i have tried to extract $str = \"Instant Oatmeal - Corn Flavour 175g (35g x 5)\"; preg_match(\'/(?P<name>.*) (?P<total_weight>\\d+)(?P<total_weight_unit>.*) \\((?P<unitWeight>

i have tried to extract

$str = "Instant Oatmeal - Corn Flavour 175g (35g x 5)";
preg_match('/(?P<name>.*) (?P<total_weight>\d+)(?P<total_weight_unit>.*) \((?P<unitWeight>\d+)(?P<unitWeight_unit>.*) x (?P<portion_no>\d+)\)/', $str, $m);

it is correct:

Instant Oatmeal - Corn Flavour 175g (35g x 5)
name : Instant Oatmeal - Corn Flavour
total_w开发者_如何转开发eight : 175 g
#portion : 5
unit weight : 35 g

However, if i want to extract

$str = "Cholcolate Sandwich Cookies (Tray) 264.6g (29.4g x 9)";

it is incorrect:

Cholcolate Sandwich Cookies (Tray) 264.6g (29.4g x 9)
name : Cholcolate Sandwich Cookies (Tray)
total_weight : 264 .6g
#portion : 9
unit weight : 29 .4g

How to solve this?


Use free-spacing mode for non-trivial regexes!

When dealing with non-trivial regexes like this one, you can dramatically improve readability (and maintainability) by writing them in free-spacing format with lots of comments (and indentation for any nested parentheses). Here is your original regex in free spacing format with comments:

$re_orig = '/# Original regex with added comments.
    (?P<name>.*)               # $name:
    [ ]                        # Space separates name from weight.
    (?P<total_weight>\d+)      # $total_weight:
    (?P<total_weight_unit>.*)  # $total_weight_unit:
    [ ]                        # Space separates totalunits from .
    \(                         # Literal parens enclosing portions data.
    (?P<unitWeight>\d+)        # $unitWeight:
    (?P<unitWeight_unit>.*)    # $unitWeight_unit:
    [ ]x[ ]                    # "space-X-space" separates portions data.
    (?P<portion_no>\d+)        # $portion_no:
    \)                         # Literal parens enclosing portions data.
    /x';

Here is an improved version:

$re_improved = '/# Match Name, total weight, units and portions data.
    ^                       # Anchor to start of string.
    (?P<name>.*?)           # $name:
    [ ]+                    # Space(s) separate name from weight.
    (?P<total_weight>       # $total_weight:
      \d+                   # Required integer portion.
      (?:\.\d*)?            # Optional fractional portion.
    )
    (?P<total_weight_unit>  # $total_weight_unit:
      .+?                   # Units consist of any chars.
    )
    [ ]+                    # Space(s) separate total from portions.
    \(                      # Literal parens enclosing portions data.
    (?P<unitWeight>         # $unitWeight:
      \d+                   # Required integer portion.
      (?:\.\d*)?            # Optional fractional portion.
    )
    (?P<unitWeight_unit>    # $unitWeight_unit:
      .+?                   # Units consist of any chars.
    )
    [ ]+x[ ]+               # "space-X-space" separates portions data.
    (?P<portion_no>         # $portion_no:
      \d+                   # Required integer portion.
      (?:\.\d*)?            # Optional fractional portion.
    )
    \)                      # Literal parens enclosing portions data.
    $                       # Anchor to end of string.
    /xi';

Notes:

  • The expressions for all the numerical quantities has been improved to allow an optional fractional portion.
  • Added start and end of string anchors.
  • Added i ignorecase modifier in case the X in the portions data is uppercase.

I'm not sure how you are applying this regex, but this improved regex should solve your immediate problem.

Edit: 2011-10-09 11:17 MDT Changed expression for units to be more lax to allow for cases pointed out by Ilmari Karonen.


Use this :

/(?P<name>.*) (?P<total_weight>\b[0-9]*\.?[0-9]+)(?P<total_weight_unit>.*) \((?P<unitWeight>\b[0-9]*\.?[0-9]+)(?P<unitWeight_unit>.*) x (?P<portion_no>\d+)\)/

Your problem is that you are not taking into account floating point numbers. I corrected this. Note that the portion is still an integer but I guess this is logical :)

0

精彩评论

暂无评论...
验证码 换一张
取 消