开发者

How to make correct regular expression

开发者 https://www.devze.com 2022-12-21 09:50 出处:网络
I want to get ${1} = Title, ${2} = Open, ${3} = Bla-bla-bla. from {{Title|Open B开发者_如何转开发la-bla-bla

I want to get ${1} = Title, ${2} = Open, ${3} = Bla-bla-bla.

from

{{Title|Open
B开发者_如何转开发la-bla-bla 
}}


What about something like this :

$str = <<<STR
{{Title|Open
Bla-bla-bla 
}}
STR;

$matches = array();
if (preg_match("/^\{\{([^\|]+)\|([^\n]+)(.*)\}\}$/s", $str, $matches)) {
    var_dump($matches);
}

It'll get you :

array
  0 => string '{{Title|Open
Bla-bla-bla 
}}' (length=28)
  1 => string 'Title' (length=5)
  2 => string 'Open' (length=4)
  3 => string '
Bla-bla-bla 
' (length=14)

Which means that, after using trim on $matches[1], $matches[2], and $matches[3], you'll get what you asked for :-)


Explaining the regex :

  • matching from the beginning of the string : ^
  • two { characters, that have to be escaped, as they have a special meaning
  • anything that's not a |, at least one time : [^\|]+
    • between () so it's captured -- returned as the first part of the result
    • | has to be escaped too.
  • a | character -- that has to be escaped.
  • Anything that's not a line-break, at least one time : [^\n]+
    • between () so it's captured too -- second part of the result
  • .* virtually "anything" anynumber of times
    • between () so it's captured too -- third part of the result
  • and, finally, two } (escaped, too)
  • and an end of string : $

And note the regex has the s (dotall) modifier ; see Pattern Modifiers, about that.


$string = "{{Title|Open
Bla-bla-bla 
}}";

preg_match('/^\{\{([^|]+)\|(.*?)[\r\n]+(.*?)\s*\}\}/', $string, $matches);
print_r($matches);


http://www.gskinner.com/RegExr/

a useful place to play around and learn regexes.


In Perl:

/\{\{         # literal opening braces
 (.*?)        # some characters except new line (lazy, i. e. as less as possible)
 \|           # literal pipe
 (.*?)        # same as 2 lines above
 \n           # new line
 ([\s\S]*?)   # any character, including new line (lazy)
 \}\}/x;      # literal closing braces

Making a more precise solution depends on what exact rules you want for extraction of your fields.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号