I am come up with a regex to grab all text between 2 HTML开发者_JAVA百科 tags. This is what I have so far:
<TAG[^>]*>(.*?)</TAG>
In practice, this should work perfectly. But executing it in PHP preg_replace with options: /ims results in the WHOLE string getting matched.
If I remove the /s tag, it works perfectly but the tags have newlines between them. Is there a better way on approaching this?
Of course there's a better way. Don't parse HTML with regex.
DOMDocument should be able to accommodate you better:
$dom = new DOMDocument();
$dom->loadHTMLFile('filename.html');
$tags = $dom->getElementsByTagName('tag');
echo $tags[0]->textContent; // Contents of `tag`
You may have to tweak the above code (hasn't been tested).
I don't recommend use regex to match in full HTML, but, you can use the "dottal" flag: /REGEXP/s
Example:
$str = "<tag>
fvox
</tag>";
preg_match_all('/<TAG[^>]*>(.*?)</TAG>/is', $str, $r);
print_r($r); //dump
精彩评论