I'm a total regexp noob. I'm working with wordpress and I'm desperately tr开发者_开发百科ying to deal with wordpress's wautop, which I hate and love (more hate!). Anyways I'm trying to remove <p>
tags around certain commands.
Here's what I get:
<p>
[hide]
<img.../>
[/hide]
</p>
or
<p>
[imagelist]
<img .../>
<img .../>
[/imagelist]
</p>
Here's what I'd like:
[hide]
<img.../>
[/hide]
or
[imagelist]
<img .../>
<img .../>
[/imagelist]
I've tried:
preg_replace('/<p[^>]*>(\[[^>]*\])<\/p[^>]*>/', '$1', $content); // No luck!
EDIT: When I am doing the regexp it is still just a variable containing text.. It is not parsed as html yet. I know it is possible because I already did it with getting rid of p tags around an image tag. So I just need a regexp to handle text that will be parsed as html at some point in the future. Here's a similar question
Thanks! Matt Mueller
You can't use regular expressions to parse HTML, because HTML is, by definition, a non-regular language. Period, end of discussion.
The language of matching HTML tags is context-free, not regular. This means regular expressions are probably not the right tool to use here. Context-free languages require parsers rather than regular expressions. So, you can either remove ALL <p>
and </p>
tags with a regular expression, or you can use an HTML parser to remove matching tags from certain parts of your document.
Try this regex:
'%<p[^>]*>\s*(\[([^\[\]]+)\].*?\[/\2\])\s*</p>%s'
Explanation:
\[([^\[\]]+)\]
matches the opening bbcode tag and captures the tag name in group #2.
\[/\2\]
matches a corresponding losing tag.
.*?
matches anything, reluctantly. Thanks to the s
flag at the end, it also matches newlines. The effect of the reluctant .*?
is that it stops matching the first time it finds a closing bbcode tag with the right name. If tags are nested (within tags with the same name) or improperly balanced, it won't work correctly. I wouldn't expect that be a problem, but I have no experience with WordPress, so YMMV.
精彩评论