开发者

HTML code strip regexp problem

开发者 https://www.devze.com 2023-01-14 06:56 出处:网络
In javascript, one of the popular regex is to strip out HTML tags from the text. The code for that is

In javascript, one of the popular regex is to strip out HTML tags from the text. The code for that is

String.prototype.stripHTML = function () { 
             var reTag = /<(?:.|\s)*?>/g; 
             return this.replace(reTag, "");
        };

If you try this on "<b>This would be bold</b>".stripHTML(), then it outputs as "This would be bold". Shouldn't it output as "" ?

Doesn't this regex says that match eve开发者_高级运维rything which starts with < and ends with > ? Why didn't this regex start at < of <b> and end at > of </b>


You are using a non-greedy modifier.

(?:.|\s)*?
         ^

This causes the match to be the shortest possible, instead of the default which is to match the longest possible match.

<b>This would be bold</b>
^-^                  ^--^     Non-greedy: <(?:.|\s)*?>
^-----------------------^     Greedy    : <(?:.|\s)*>


Yes, but the *? performs an ungreedy match (short match):

var reTag = /<(?:.|\s)*?>/g; 

To perform reedy match (longest match possible), remove the ?:

var reTag = /<(?:.|\s)*>/g; 


It's not a greedy regex, meaning that it matches the first > it comes across, the <b> and </b> are separate matches.

0

精彩评论

暂无评论...
验证码 换一张
取 消