开发者

JavaScript regular expression literal persists between function calls

开发者 https://www.devze.com 2022-12-27 13:18 出处:网络
I have this piece of code: function func1(text) { var pattern = /([\\s\\S]*?)(\\<\\?(?:attrib |if |else-if |else|end-if|search |for |end-for)[\\s\\S]*?\\?\\>)/g;

I have this piece of code:

function func1(text) {

    var pattern = /([\s\S]*?)(\<\?(?:attrib |if |else-if |else|end-if|search |for |end-for)[\s\S]*?\?\>)/g;

    var result;
    while (result = pattern.exec(text)) {
        if (some condition) {
            throw new Error('failed');
        }
        ...
    }
}

This works, unless the throw statement is executed. In that case, the next time I call the function, the exec() call starts where it left off, even though I am supplying it with a new value of 'text'.

I can fix it by writing

var pattern = new RegExp('.....');

instead, but I don't understand why the first version is failing. How is the regular expression persisting between function calls? (This is happening in the latest versions of Firefox and Chrome.)

Edit Complete test case:

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Test Page</title>
<style type='text/css'>
body {
    font-family: sans-serif;
}
#log p {
    margin:     0;
    padding:    0;
}
</style>
<script type='text/javascript'>
function func1(text, count) {

    var pattern = /(one|two|three|four|five|six|seven|eight)/g;

    log("func1");
    var result;
    while (result = pattern.exec(text)) {
        log("result[0] = " + result[0] + ", pattern.index = " + pattern.index);
        if (--count <= 0) {
            throw "Error";开发者_StackOverflow社区
        }
    }
}

function go() {
    try { func1("one two three four five six seven eight", 3); } catch (e) { }
    try { func1("one two three four five six seven eight", 2); } catch (e) { }
    try { func1("one two three four five six seven eight", 99); } catch (e) { }
    try { func1("one two three four five six seven eight", 2); } catch (e) { }
}

function log(msg) {
    var log = document.getElementById('log');
    var p = document.createElement('p');
    p.innerHTML = msg;
    log.appendChild(p);
}

</script>
</head>
<body><div>
<input type='button' id='btnGo' value='Go' onclick='go();'>
<hr>
<div id='log'></div>
</div></body>
</html>

The regular expression continues with 'four' as of the second call on FF and Chrome, not on IE7 or Opera.


RegExp objects that are created by means of a regex literal are cached, but new RegExp always creates a new object. The cached objects also save their state, but the rules governing that aspect are apparently not very clear. Steve Levithan talks about that in this blog post (near the bottom).


I'll go out on a limb here: I think the behavior you're seeing is a bug in FF's and Chrome's Javascript engines (heresy!). Surprising that it should happen in two such different engines, though. Looks like an optimization error. Specifically, section 7.8.5 of the spec says:

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated.

The only wiggle room I see is in the phrase "..each time the literal is evaluated" (my emphasis). But I don't see why the resulting object should be magically retained any more than any other object literal, such as:

function func1() {
    var x = {};
    return x;
}

There, subsequent calls to func1 will give you distinct objects. Hence my saying it looks like a bug to me.

Update Alan Moore points to an article by Steve Levithan in which Levithan makes the claim that the ECMAScript 3rd edition specification may have allowed this kind of caching. Fortunately, it is not allowed as of ECMAScript 5th edition (the spec I was working from) and is, therefore, going to be a bug Real Soon Now. Thanks Alan!


I don't know the answer, but I will hazard a guess:

The literal expression which is the pattern has global scope, and is evaluated (into a RegExp object) only once, whereas if you use new Regexp its argument is still global, but is just a string, not a RegExp.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号