Regular expression to remove tags around elements wrapped in [...]'s_问答_开发者

Regular expression to remove tags around elements wrapped in [...]'s

开发者 https://www.devze.com 2022-12-18 09:33 出处：网络

I\'m a total regexp noob. I\'m working with wordpress and I\'m desperately tr开发者_开发百科ying to deal with wordpress\'s wautop, which I hate and love (more hate!). Anyways I\'m trying to remove <

I'm a total regexp noob. I'm working with wordpress and I'm desperately tr开发者_开发百科ying to deal with wordpress's wautop, which I hate and love (more hate!). Anyways I'm trying to remove  tags around certain commands.

Here's what I get:

<p>
[hide]
<img.../>
[/hide]
</p>

<p>
[imagelist]
<img .../>
<img .../>
[/imagelist]
</p>

Here's what I'd like:

[hide]
<img.../>
[/hide]

[imagelist]
<img .../>
<img .../>
[/imagelist]

I've tried:

preg_replace('/<p[^>]*>(\[[^>]*\])<\/p[^>]*>/', '$1', $content); // No luck!

EDIT: When I am doing the regexp it is still just a variable containing text.. It is not parsed as html yet. I know it is possible because I already did it with getting rid of p tags around an image tag. So I just need a regexp to handle text that will be parsed as html at some point in the future. Here's a similar question

Thanks! Matt Mueller

You can't use regular expressions to parse HTML, because HTML is, by definition, a non-regular language. Period, end of discussion.

The language of matching HTML tags is context-free, not regular. This means regular expressions are probably not the right tool to use here. Context-free languages require parsers rather than regular expressions. So, you can either remove ALL  and  tags with a regular expression, or you can use an HTML parser to remove matching tags from certain parts of your document.

Try this regex:

'%<p[^>]*>\s*(\[([^\[\]]+)\].*?\[/\2\])\s*</p>%s'

Explanation:

\[([^\[\]]+)\] matches the opening bbcode tag and captures the tag name in group #2.

\[/\2\] matches a corresponding losing tag.

.*? matches anything, reluctantly. Thanks to the s flag at the end, it also matches newlines. The effect of the reluctant .*? is that it stops matching the first time it finds a closing bbcode tag with the right name. If tags are nested (within tags with the same name) or improperly balanced, it won't work correctly. I wouldn't expect that be a problem, but I have no experience with WordPress, so YMMV.