开发者

regexp to remove entire paragraph based on it's content?

开发者 https://www.devze.com 2023-02-11 21:43 出处:网络
hey guys, I\'m a regexp noob, Is it possible with preg_replace to re开发者_JS百科move a the an entire paragraph tag?

hey guys, I'm a regexp noob, Is it possible with preg_replace to re开发者_JS百科move a the an entire paragraph tag?

<p><div class="vidwrapper"> lot of content with oder divs etc. </div><p>

The paragraph should only be removed if it is following div has a class of .vidwrapper.

Is that even possible? Any idea how this regexp would look like? Thank you for your help.


If it's a fixed occurrence, then following might work:

preg_replace('#<p>[^<]*<div[^>]+class="vidwrapper"[^>]*>.*?</p>#is', "")

For matching nested html you would normally need a recursing regex, hencewhy something like phpQuery or QueryPath is then often simpler:

$html = pq($html)->find("p div.vidwrapper")->parent()->remove()->html();


It's a bad idea to do this using a regex, unless you know that there will be no paragraph (or anything that might superficially be interpreted as a paragraph) inside of the vidwrapper.

If you don't, writing a regex for something like this will be very hard:

<p><div class="vidwrapper"> Hello there. <p>Wee.</p> Yoink. </div></p>
<p><div class="vidwrapper"> Hello there. <!-- <p>Wee.</p> --> Yoink. </div></p>

An easier (and more robust) way would probably be to parse the HTML with an HTML parser, and do a search on the DOM tree instead.

See also:

  • Robust and Mature HTML Parser for PHP
  • RegEx match open tags except XHTML self-contained tags


If you think the script will cause problems, you can use this as well.

#
 \s*
 <p\s*> \s* <div \s+ class \s* = \s* (["']) vidwrapper \1 \s* >
 (?:
      <script (?:\s+ (?:".*?"|'.*?'|[^>]*?)+)? \s*>
      .*?
      </script\s*>)
   |  .
 )*?
 </p\s*>
#xs
0

精彩评论

暂无评论...
验证码 换一张
取 消