开发者

Absolute beginner regex question

开发者 https://www.devze.com 2023-01-11 18:39 出处:网络
I\'m using PHP 5\'s preg functions, if it makes a difference. Consider the regular language matched by the following regular expression.

I'm using PHP 5's preg functions, if it makes a difference.

Consider the regular language matched by the following regular expression.

([^{}] | {[0-9a-zA-Z_]+})*

The language consists of strings of any number of characters, with special em开发者_StackOverflow社区bedded tags marked off by left and right curly brackets, which contain a string of one or more alphanumeric or underscore characters. For example, the following is a valid string in the language:

asdfasdf 1243#$*#{A_123}asdf?{432U}

However, while validating a string with this regex, I would like to get a list of these curly-bracket-delimited tags and their positions in the string. Considering the previous example string, I'd like to have an array that tells me:

A_123: 20; 432U: 32

Is this possible with regular expressions? Or should I just write a function "by hand" without regexp that goes through every character of the string and parses out the data I need?

Forgive me if this is an elementary question; I'm just learning!


To capture the offsets, you can set the PREG_OFFSET_CAPTURE flag. http://php.net/manual/en/function.preg-match.php

preg_match ($regex, $subject, $matches, PREG_OFFSET_CAPTURE);

You can run the following script yourself and see the results:

$regex = '~({(\w+)})+~';
$str = 'asdfasdf 1243#$*#{A_123}asdf?{432U}';

preg_match_all($regex, $str, $m, PREG_OFFSET_CAPTURE);
$tags = $m[1];

echo '<pre>';
print_r($tags); // prints tags and their offsets
echo '</pre>';

On the pattern:

  • \w is a escape sequence equivalent to the following character class: [a-zA-Z0-9_]
  • The round brackets (...) are used for grouping and they also create backreferences.
  • The + is a quantifier that means "one or more" of the previous pattern

A good resource on regex: http://www.regular-expressions.info

0

精彩评论

暂无评论...
验证码 换一张
取 消