Regexp matching mismatched html_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-27 07:05 出处：网络

How do I parse a certain link style out of html without it spreading across multiple lin开发者_StackOverflow中文版ks to match?

相关专题：php regex

How do I parse a certain link style out of html without it spreading across multiple lin开发者_StackOverflow中文版ks to match?

The exact link I am trying to match is:

href="http://www.hotmail.com' rel='external nofollow"

Pay particular attention to the mismatching of ' and " in the above.

What I have tried:

if(preg_match('|href="http(.*?)\' rel=\'(.*?)"|i', $html)){
  echo "Found bad html\n";
}

However that regexp is also matching in perfectly good html across several links. I need to be able to only match within a single link.

You might be able to adapt your regex by replacing the generic .*? with a negative character class like [^<"'>]+. That usually prevents that it eats up too much.

if(preg_match('| href="(http[^<"\'>]+)\' rel=\'([^<"\'>]+)"|i', $html)){

Better yet: don't hard-code the " and ', but use a character class to match them too:

if(preg_match('| href=["\']http([^<"\'>]+)["\']'
              .' rel=["\']([^<"\'>]*)["\']|i', $html)){

(Oh, now it looks really ugly.)

Regexp matching mismatched html

精彩评论

关注公众号

热门标签

图文推荐

Regexp matching mismatched html

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：