I'm trying to parse an AJAX response and get the id of the body tag. I can't do this via jQuery, though, because jQuery can't emulate a DOM response for a body tag.
I've split this into many lines to try and isolate the error. This works in modern browsers, but it's failing in IE8.
var bodyIDregex = new RegExp(/<body[^>]*id=["'](.*?)["']>/gi),
matched = html.match(bodyIDregex),
bodyID = bodyIDregex.exec(matched[0]);
bodyID = bodyID[1];
I have confirmed that the value of the variable html is as expected.
Any help?
Thanks!
I'm trying to parse an AJAX response and get the id of the body tag. I can't do this via jQuery, though, because jQuery can't emulate a DOM response for a body tag.
I've split this into many lines to try and isolate the error. This works in modern browsers, but it's failing in IE8.
var bodyIDregex = new RegExp(/<body[^>]*id=["'](.*?)["']>/gi),
matched = html.match(bodyIDregex),
bodyID = bodyIDregex.exec(matched[0]);
bodyID = bodyID[1];
I have confirmed that the value of the variable html is as expected.
Any help?
Thanks!
Share Improve this question asked May 23, 2013 at 18:13 M MillerM Miller 5,68210 gold badges48 silver badges69 bronze badges 3-
1
In addition to what Asad said, my guess is this has something to do with the "g" flag and your sequence of first calling
.match(regex)
and then callingregex.exec()
with the same regex. The "g" flag can leave state in the regular expression because.exec
can be called multiple times to yield successive matches. I'd suggest that you use a different regular expression for the.match()
call (probably without the "g" flag). I could verify this was the case if you included sample text you're matching against, but without that, we are just guessing. – jfriend00 Commented May 23, 2013 at 18:38 - @paulsm4: Wrong. jQuery is indeed faulty in that way, it cannot parse full HTML documents. And there are many more things that jQuery was not even designed to do… – Bergi Commented May 23, 2013 at 21:09
- It'd be interesting to see if jQuery could parse the body tag if it were set to interpret the results as an XML document rather than an HTML document... but to be honest, I'm not sure if that would work either. Of course, the limitation there would be that you would have to have an HTML document that is valid XML (XHTML). And I write in HTML5, sometimes without closing slash, etc. required for an XML doc. – M Miller Commented May 23, 2013 at 23:24
2 Answers
Reset to default 6You should either pass a string to the constructor for regexen, or use the regex literal syntax, but not both.
var bodyIDregex = /<body[^>]*id=["'](.*?)["']>/gi
or
var bodyIDregex = new RegExp("<body[^>]*id=[\"'](.*?)[\"']>","gi")
Update:
As you have correctly identified in your answer, the problem stems from the fact that the regex search continues from the position of the last character in the previous match. One way to correct this is to reset lastIndex
, but in this case this is not required, since you only need to match against the string once:
var bodyIDregex = /<body[^>]*id=["'](.*?)["']>/gi,
bodyID = bodyIDregex.exec(html);
//bodyID is now the array, ["<body id="test">asdf</body>", "test"]
alert(bodyID[1]);
//alerts the captured group, "test"
Apparently, when you call (RegExp object).match(string), it increments a property of the RegExp object called lastIndex. I am not pletely familiar with how the RegExp object works, but this causes an issue when trying to call the exec() method later.
The solution, apparently, is to reset lastIndex to zero.
var html = '<html><body id="test">asdf</body></html>';
var bodyIDregex = /<body[^>]*id=["'](.*?)["']>/gi,
matched = html.match(bodyIDregex);
// Reset lastIndex
bodyIDregex.lastIndex = 0;
var bodyID = bodyIDregex.exec(matched[0]);
alert(bodyID.length);
bodyID = bodyID[1];
document.write(bodyID); // writes test