PHP Regex ignoring nested tags_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-07 00:02 出处：网络

Hi I\'m working on a bug in a CMS and I was hoping someone could give me some help with this messy regex! I need to remove everything inside the {{page?}} tags (where \'page\' is a dynamic word), incl

相关专题：php regex

Hi I'm working on a bug in a CMS and I was hoping someone could give me some help with this messy regex! I need to remove everything inside the {{page? }} tags (where 'page' is a dynamic word), including any nested {{tags}} within them.- except for {{links? }}

In the code below, the regex should remove everything inside the {{homepage? }} tag:

<div id="main">   
    <div id="left">
    {{menu1}}<br />

{{homepage?
    <img src="images/{{timenow}}.gif" width="177" height="217" alt="{{imgname}}"开发者_运维技巧 id="biglogo" />
}}

{{links?
    <b>LINKS</b>
}}
</div>
{{menu2}}
</div>

Here's what I have so far. It's getting stuck as soon as it sees the timenow}}

$result=preg_replace("#\{\{(?!links)\S*?\?.*?}}#s","",$result);

Clarification:

There are no {{page? }} sub tags (all subtags are {{thisformat}} ). In other words something like: {{foo? {{links? bar }} baz }} would never occur.

You can do something like: #\{\{ (?!links\b) \w+ \? (?: \{\{\w+}} | [^{}]+ | \{(?!\{) | }(?!}) )* }}#sx

If I understand it correctly, there's no need for recursive matching here; the {{page? }} tags may contain simple tags like {{this}}, and that's it. In that case, you just have to watch out for the beginning of a nested tag, so you can match the end of that tag when it shows up, then go on looking for either the end of the enclosing {{page? }} tag or the beginning of another nested tag.

$regex='#
  \{\{ (?!links\?) \w++\?     # page-tag start
  (?:
    (?: (?!\{\{|\}\}) . )++   # normal content
  |
    \{\{                      #
    (?: (?!\}\}) . )*+        # embedded tag
    \}\}                      #
  )*+
  \}\}                        # page-tag end
#sx';

The "normal content" part matches one or more of any character, unless the next character is the beginning of a {{ or }} sequence. Once we've started to match an embedded tag, we use the same technique to gobble up its content.

see it in action at ideone.com

This is not possible with regex. Read about the millions of failed attempts to parse nested html/xml with regex.