最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - JavaScript regular expression literal persists between function calls - Stack Overflow

programmeradmin0浏览0评论

I have this piece of code:

function func1(text) {

    var pattern = /([\s\S]*?)(\<\?(?:attrib |if |else-if |else|end-if|search |for |end-for)[\s\S]*?\?\>)/g;

    var result;
    while (result = pattern.exec(text)) {
        if (some condition) {
            throw new Error('failed');
        }
        ...
    }
}

This works, unless the throw statement is executed. In that case, the next time I call the function, the exec() call starts where it left off, even though I am supplying it with a new value of 'text'.

I can fix it by writing

var pattern = new RegExp('.....');

instead, but I don't understand why the first version is failing. How is the regular expression persisting between function calls? (This is happening in the latest versions of Firefox and Chrome.)

Edit Complete test case:

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Test Page</title>
<style type='text/css'>
body {
    font-family: sans-serif;
}
#log p {
    margin:     0;
    padding:    0;
}
</style>
<script type='text/javascript'>
function func1(text, count) {

    var pattern = /(one|two|three|four|five|six|seven|eight)/g;

    log("func1");
    var result;
    while (result = pattern.exec(text)) {
        log("result[0] = " + result[0] + ", pattern.index = " + pattern.index);
        if (--count <= 0) {
            throw "Error";
        }
    }
}

function go() {
    try { func1("one two three four five six seven eight", 3); } catch (e) { }
    try { func1("one two three four five six seven eight", 2); } catch (e) { }
    try { func1("one two three four five six seven eight", 99); } catch (e) { }
    try { func1("one two three four five six seven eight", 2); } catch (e) { }
}

function log(msg) {
    var log = document.getElementById('log');
    var p = document.createElement('p');
    p.innerHTML = msg;
    log.appendChild(p);
}

</script>
</head>
<body><div>
<input type='button' id='btnGo' value='Go' onclick='go();'>
<hr>
<div id='log'></div>
</div></body>
</html>

The regular expression continues with 'four' as of the second call on FF and Chrome, not on IE7 or Opera.

I have this piece of code:

function func1(text) {

    var pattern = /([\s\S]*?)(\<\?(?:attrib |if |else-if |else|end-if|search |for |end-for)[\s\S]*?\?\>)/g;

    var result;
    while (result = pattern.exec(text)) {
        if (some condition) {
            throw new Error('failed');
        }
        ...
    }
}

This works, unless the throw statement is executed. In that case, the next time I call the function, the exec() call starts where it left off, even though I am supplying it with a new value of 'text'.

I can fix it by writing

var pattern = new RegExp('.....');

instead, but I don't understand why the first version is failing. How is the regular expression persisting between function calls? (This is happening in the latest versions of Firefox and Chrome.)

Edit Complete test case:

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Test Page</title>
<style type='text/css'>
body {
    font-family: sans-serif;
}
#log p {
    margin:     0;
    padding:    0;
}
</style>
<script type='text/javascript'>
function func1(text, count) {

    var pattern = /(one|two|three|four|five|six|seven|eight)/g;

    log("func1");
    var result;
    while (result = pattern.exec(text)) {
        log("result[0] = " + result[0] + ", pattern.index = " + pattern.index);
        if (--count <= 0) {
            throw "Error";
        }
    }
}

function go() {
    try { func1("one two three four five six seven eight", 3); } catch (e) { }
    try { func1("one two three four five six seven eight", 2); } catch (e) { }
    try { func1("one two three four five six seven eight", 99); } catch (e) { }
    try { func1("one two three four five six seven eight", 2); } catch (e) { }
}

function log(msg) {
    var log = document.getElementById('log');
    var p = document.createElement('p');
    p.innerHTML = msg;
    log.appendChild(p);
}

</script>
</head>
<body><div>
<input type='button' id='btnGo' value='Go' onclick='go();'>
<hr>
<div id='log'></div>
</div></body>
</html>

The regular expression continues with 'four' as of the second call on FF and Chrome, not on IE7 or Opera.

Share Improve this question edited Apr 15, 2010 at 12:56 T.J. Crowder 1.1m200 gold badges2k silver badges1.9k bronze badges asked Apr 15, 2010 at 12:43 Charles AndersonCharles Anderson 20.1k13 gold badges59 silver badges75 bronze badges 2
  • 1 I've taken the liberty of posting a plete, simplified test case, hope you don't mind. I've seen this behavior as well and wondered why it would be. It looks and smells like a bug, but then, sometimes things are very subtle, and it's surprising that both FF and Chrome would have it given their pletely different underlying Javascript engines. – T.J. Crowder Commented Apr 15, 2010 at 12:57
  • Just to be clear, it works as long as the error/exception isn't thrown, but if 'some condition' bees true and the exception is thrown, then the function will fail on the next invocation because the pattern continues from where the exception was thrown? That sure sounds like a bug that's out of your hands. – PatrikAkerstrand Commented Apr 15, 2010 at 13:07
Add a ment  | 

3 Answers 3

Reset to default 7

RegExp objects that are created by means of a regex literal are cached, but new RegExp always creates a new object. The cached objects also save their state, but the rules governing that aspect are apparently not very clear. Steve Levithan talks about that in this blog post (near the bottom).

I'll go out on a limb here: I think the behavior you're seeing is a bug in FF's and Chrome's Javascript engines (heresy!). Surprising that it should happen in two such different engines, though. Looks like an optimization error. Specifically, section 7.8.5 of the spec says:

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated.

The only wiggle room I see is in the phrase "..each time the literal is evaluated" (my emphasis). But I don't see why the resulting object should be magically retained any more than any other object literal, such as:

function func1() {
    var x = {};
    return x;
}

There, subsequent calls to func1 will give you distinct objects. Hence my saying it looks like a bug to me.

Update Alan Moore points to an article by Steve Levithan in which Levithan makes the claim that the ECMAScript 3rd edition specification may have allowed this kind of caching. Fortunately, it is not allowed as of ECMAScript 5th edition (the spec I was working from) and is, therefore, going to be a bug Real Soon Now. Thanks Alan!

I don't know the answer, but I will hazard a guess:

The literal expression which is the pattern has global scope, and is evaluated (into a RegExp object) only once, whereas if you use new Regexp its argument is still global, but is just a string, not a RegExp.

发布评论

评论列表(0)

  1. 暂无评论