$pattern='`<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)<开发者_开发技巧/a>`isU';
And I want to change ([^<]*)
this to search for </a>
not only <
cause <img>
tag could be inside <a>
tag.
Can anyone help, I'm lousy at regex.
You can use a PHP parser to do this. I wouldn't use Regex at all.
You can try: http://simplehtmldom.sourceforge.net/
Although I think PHP has a DOM parser built in.
Changing ([^<]*)
to a ungreedy match all (.*?)
might do the trick
([^<]*)
could be changed to ((?:[^<]|<(?!/a>))*)
, which uses a negative lookahead to match non-<
characters or <
characters which are not followed by /a>
. See it in action here.
HOWEVER, as stated many times over already, this is not a good way to parse HTML. Firstly, it's horribly inefficient, and secondly, what happens if you have nested tags, such as <a><a></a></a>
? While this may not happen with hyperlinks, it's common among many other HTML elements.
精彩评论