How would I make a regular expression to match the character <
not followed by (a
or em开发者_如何学Go
or strong
)
So <hello
and <string
would match, but <strong
wouldn't.
Try this:
<(?!a|em|strong)
You use a negative lookahead, the simplest form for which is (for this problem):
<(?!a|em|strong)
The one issue with that is that it will ignore <applet>
. A way to deal with that is by using \b
, which is a zero-width expression (meaning it captures none of the input) that matches a word to non-word or non-word to word transition. Word characters are [0-9a-zA-Z_]
. So:
<(?!(a|em|strong)\b)
Although Andrew's answer is clearly superior, before, I also got it to work with [^(?:a|em|strong)]
.
If your regex engine supports it, use a negative lookahead assertion: this looks ahead in the string, and succeeds if it wouldn't match; however, it doesn't consume any input. Thus, you want /<(?!(?:a|em|strong)\b)/
: match a <
, then succeed if there isn't an a
, em
, or strong
followed by a word break, \b
.
function strip_tags(str, keep){
if(keep && Array.isArray(keep)){keep = '|'+keep.join('|');}else if(keep){keep = '|'+keep;}else{keep = '';}
return str.replace(new RegExp('<\/?(?![^A-Za-z0-9_\-]'+keep+').*?>', 'g'), '');
}
usage:
strip_tags('<html><a href="a">a</a> <strong>strong text</strong> and <em>italic text</em></html>', ['strong', 'em']);
//output: a <strong>strong text</strong> and <em>italic text</em>
I would also recommend you strip parameters from the tags you keep
function strip_params(str){
return str.replace(/<((?:[A-Za-z0-9_\-])).*?>/g, '<$1>');
}
精彩评论