I use PHP pattern modifier "U" to invert the default greedy behavior with preg_match(). However, it doesn't work the way I want. My code:
$str = '<p>
<div><a aaa
<a href="a.mov"></a>
</div>
</p>';
$needle = "a.mov";
$pattern = "/\<a.*".preg_quote($needle, "/").".*\<\/a\>/sU";
preg_match($pattern, $str, $matches);
print_r($matches);
开发者_StackOverflow中文版
I'm trying to match on
<a href="a.mov"></a>
But this chunk of code returns me
<a aaa
<a href="a.mov"></a>
Can someone shed me some light of where I did wrong?
Well, in more general sense, you did wrong when trying to parse HTML with regexps, but regarding the snippet of code you have provided, the problem is that the ungreedy modifier tells *
, +
and {n,}
to stop as soon as they are happy instead of going all the way.
So it essentially affects where the matching ends instead of where it begins - "ungreedy" is not intended to mean "give me the shortest" match possible.
You can kind of like fix this particular example using mU
modifiers instead of sU
, so that .
don't match new lines.
My array is turning up empty as well. You have to be careful about linebreaks when you try to use Regex with HTML. There may be an issue with single line mode.
See: http://www.regular-expressions.info/dot.html
I've successfully parsed HTML with regex but I wouldn't do it going forward. Look into
http://simplehtmldom.sourceforge.net/
You will never look back.
精彩评论