开发者

interpreting braces in regexp syntax

开发者 https://www.devze.com 2023-02-09 06:29 出处:网络
I\'m trying to decipher this regexp formal definition of floating point numbers (from php.net) LNUM[0-9]+

I'm trying to decipher this regexp formal definition of floating point numbers (from php.net)

LNUM          [0-9]+
DNUM          ([0-9]*[\.]{LNUM}) | ({LNUM}[\.][0-9]*)
EXPONENT_DNUM [+-]?(({LNUM} | {DNUM}) [eE][+-]? {LNUM})
开发者_如何学Go

LNUM means one or more occurrences of the digits 0 to 9. DNUM means zero or more occurrences of the digits 0 to 9, followed by a decimal point. I don't know how to interpret {LNUM}. From what I've read, the braces mean repetition, but then wouldn't

[\.]{LNUM}

mean LNUM occurrences of the decimal point (which wouldn't make sense). And then in the second part of the alternation (after the | character), {LNUM} occurs at the beginning, and I don't find a definition for that usage of braces in regexp syntax (POSIX or Perl). Can someone clear this up for me?

Thank you, Bill


This is not strict regular expressions syntax. {LNUM} is a placeholder for the definition of LNUM. For example, the second line in strict regexp syntax is

([0-9]*[\.][0-9]+) | ([0-9]+[\.][0-9]*)


Yea, has nothing to do with the regex, it looks like a variable substitution.
You say this is the formal definition? After substituting and looking at the exponent notation, it looks like that whole thing can be trimmed down. Also, the use of the quantifiers there would make for an infinitely large number of digits. And they don't account for spaces anywhere, maybe its a strict parse for something.

[+-]?(([0-9]+ | ([0-9]*[\.][0-9]+) | ([0-9]+[\.][0-9]*)) [eE][+-]? [0-9]+)

[+-]?                # '+' or '-'  0 or 1 time
(                    # group 1, not needed
   (                    # group 2
        [0-9]+             # a digit, 1 or more times
      |                      # OR
        (                  # group 3
          [0-9]*              # a digit, 0 or more times
          [\.]                # a '.' exactly 1 time, character class not needed
          [0-9]+              # a digit, 1 or more times
        )                  # end group 3
      |                      # OR
        (                  # group 4
          [0-9]+             # a digit, 1 or more times
          [\.]               # a '.' exactly 1 time, character class not needed
          [0-9]*             # a digit, 0 or more times
        )                  # end group 4
   )                    # end group 2
   [eE]                 # 'e' or 'E' exactly 1 time
   [+-]?                # '+' or '-'  0 or 1 time
   [0-9]+               # a digit, 1 or more times
)                 # end group 1, not needed
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号