开发者

Regular expression syntax problem

开发者 https://www.devze.com 2023-03-12 20:41 出处:网络
$pattern=\'`<a\\s+[^>]*(href=([\\\'\\\"]).*\\\\2)[^>]*>([^<]*)<开发者_开发技巧/a>`isU\';
$pattern='`<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)<开发者_开发技巧/a>`isU';

And I want to change ([^<]*) this to search for </a> not only < cause <img> tag could be inside <a> tag.

Can anyone help, I'm lousy at regex.


You can use a PHP parser to do this. I wouldn't use Regex at all.

You can try: http://simplehtmldom.sourceforge.net/

Although I think PHP has a DOM parser built in.


Changing ([^<]*)to a ungreedy match all (.*?) might do the trick


([^<]*) could be changed to ((?:[^<]|<(?!/a>))*), which uses a negative lookahead to match non-< characters or < characters which are not followed by /a>. See it in action here.

HOWEVER, as stated many times over already, this is not a good way to parse HTML. Firstly, it's horribly inefficient, and secondly, what happens if you have nested tags, such as <a><a></a></a>? While this may not happen with hyperlinks, it's common among many other HTML elements.

0

精彩评论

暂无评论...
验证码 换一张
取 消