I'm using PHP 5's preg functions, if it makes a difference.
Consider the regular language matched by the following regular expression.
([^{}] | {[0-9a-zA-Z_]+})*
The language consists of strings of any number of characters, with special em开发者_StackOverflow社区bedded tags marked off by left and right curly brackets, which contain a string of one or more alphanumeric or underscore characters. For example, the following is a valid string in the language:
asdfasdf 1243#$*#{A_123}asdf?{432U}
However, while validating a string with this regex, I would like to get a list of these curly-bracket-delimited tags and their positions in the string. Considering the previous example string, I'd like to have an array that tells me:
A_123: 20; 432U: 32
Is this possible with regular expressions? Or should I just write a function "by hand" without regexp that goes through every character of the string and parses out the data I need?
Forgive me if this is an elementary question; I'm just learning!
To capture the offsets, you can set the PREG_OFFSET_CAPTURE
flag.
http://php.net/manual/en/function.preg-match.php
preg_match ($regex, $subject, $matches, PREG_OFFSET_CAPTURE);
You can run the following script yourself and see the results:
$regex = '~({(\w+)})+~';
$str = 'asdfasdf 1243#$*#{A_123}asdf?{432U}';
preg_match_all($regex, $str, $m, PREG_OFFSET_CAPTURE);
$tags = $m[1];
echo '<pre>';
print_r($tags); // prints tags and their offsets
echo '</pre>';
On the pattern:
\w
is a escape sequence equivalent to the following character class:[a-zA-Z0-9_]
- The round brackets
(...)
are used for grouping and they also create backreferences. - The
+
is a quantifier that means "one or more" of the previous pattern
A good resource on regex: http://www.regular-expressions.info
精彩评论