How to exclude style attribute from HTML string with regular expressions开发者_JAVA百科?
For example if we have following inline HTML string:
<html><body style="background-color:yellow"><h2 style="background-color:red">This is a heading</h2><p style="background-color:green">This is a paragraph.</p></body></html>
When apply the regular expression matching, matched result should look like:
<html><body ><h2 >This is a heading</h2><p >This is a paragraph.</p></body></html>
You can't parse HTML with regular expressions because HTML is not regular.
Of course you can cut corners at your own peril, for example by searching for style\s*=\s*"[^"]*"
and replacing that with nothing, but that will remove any occurence of style="anything"
from your text.
You simply need to replace the style tags with nothing, here's an example how to do so with PHP:
$text = preg_replace('/\s+style="[^"]*"/', '', $text);
It is mostly answered that regex's in most cases are not suitable for HTML, so you should provide the language in which you plan to implement this.
However a regex like this will replace the heading:
<h2\s+style="background-color:red">
// replace with
<h2>
The regex for the paragraph tag is analogous (replace 'h2' with 'p' and 'red' with 'green').
精彩评论