I have the following pattern using it to match HTML tags:
~<([[:alpha:]]+) ([[:alpha:]]+=".*?")*>.*?</\1>~si
It works fine and will match any tag, but it will only search throughout the string for the first match it encounters. For example:
$text = <<<text
<p class="matches">some text, this will match</p>
<p>this won't match</p>
<p>this won't match either</p>
<p class="matches">this will match</p>
<p class="matches">this will match too</p>
<div>This won't match either but I want it to..</div>
text;
$pattern = '~<([[:alpha:]]+) ([[:alpha:]]+=".*?")*>.*?</\1>~si';
preg_match_all($pattern,$text,$matches);
var_dump($matches);
The code posted will fill $matches开发者_运维问答 as I want it to, but $matches[0][*] will only contain the 3 paragraphs that have the class="matches" attribute (I tested this pattern on tags without attributes and it does match those properly too). Rexexp is not my forté... What am I doing wrong?
Add \s?
between your element and attribute match
~<([[:alpha:]]+)\s?([[:alpha:]]+=".*?")*>.*?</\1>~si
Also, you shouldn't be using regex for HTML.
精彩评论