One of the HTML开发者_C百科 input fields in an app I'm working on is being validated with the following regex pattern:
.{5,}+
What is this checking for?
Other fields are being checked with this pattern which I also don't understand:
.+
We can break your pattern down into three parts:
The dot is a wildcard, it matches any character (except for newlines, by default, unless the /s
modifier is set).
{5,}
is specifies repetition on the dot. It says that the dot must match at least 5 times. If there was a number after the comma, the dot would have to match between 5 and that number of times, but since there's no number, it can match infinite times.
In your first pattern, the +
is a possessive quantifier (see below for how +
can mean different things in different situations). It tells the regular expression engine that once it's satisfied the previous condition (ie. .{5,}
), it should not try to backtrack.
Your second pattern is simpler. The dot still means the same thing as above (works as a wildcard). However, here the +
has a different meaning, and is a repetition operator, meaning that the dot must match 1 or more times (that could also be expressed as .{1,}
, as we saw above).
As you can see, +
has a different meaning depending on context. When used on its own, it is a repetition operator. However when it follows a different repetition operator (either *
, ?
, +
or {...}
) it becomes a possessive quantifier.
The +
means after another quantifier ({5,}
) means a possessive match, i.e. once a match is found, *do not backtrack**.
For instance, the pattern .{5,}x
will match abcdex
:
.{5,}
matchesabcdex
.x
matches nothing.- So backtrack
.{5,}
and let it matchabcde
. - Now
x
matches that lastx
.
But .{5,}+x
will not match abcdex
:
.{5,}+
matchesabcdex
.x
matches nothing.- Cannot backtrack the
.{5,}+
. We have to stop here.
*: Even the pattern cannot be backtracked, the matched strings can still be deleted as a whole. For instance, a?.{5,}x
will match {a?
→ a
, .{5,}+
→ bcdex
, x
→ no match}, and then delete the whole .{5,}+
and a
and restart with {a?
→ ,
.{5,}+
→ abcdex
, x
→ no match}. Therefore, we can also say that the +
makes the quantifier "atomic".
On the other hand, +
alone just mean {1,}
, i.e. match one or more times.
Any character, 5 or more times.
- "." means any character except a line break.
- {m, n} defines a bounded interval. "m" is the min. "n" is the max. If n is not defined, as is here, it is unlimited.
- "+" means possessive.
.{5,}+
means
- Match any single character that is not a line break character
- Between 5 and unlimited times; as many times as possible, without giving back (possessive)
.+
is the same thing but it matches between 1 and unlimited times, giving back as needed (greedy).
As I've mentioned many times before, I'm a huge fan of RegexBuddy. It's "Create" mode is excellent for deconstruction regular expressions.
精彩评论