最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - different split Regex result in IE - Stack Overflow

programmeradmin0浏览0评论

i get some HTML it a as ajax response, and i need to get just the body contents. So i made this regex:

/(<body>|<\/body>)/ig

works well in all browser but for some reason IE gives me an other array when i use split:

data.split(/(<body>|<\/body>)/ig)

In all normal browsers the content of the body is split(/(<body>|<\/body>)/ig)[2] but in ie its in split(/(<body>|<\/body>)/ig)[1]. (tested in IE7 & 8)

Why is this? And how could i modify it, in order to get the same array in all browsers?

edit just to clarify. I alrady have a solution as mentioned by tobyodavies. I want to understandy, why it behaves differently.

this is the HTML from the response: (the string in data)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">.dtd">
<html xmlns=""  xml:lang="de"  lang="de" dir="ltr">
<head>
blablabla...
</head>
<body>
<div class="iframe">
   <div id="block-menu-menu-primary-links-user" class="block-menu">
 <h3>Primary Links - User</h3>  <div class="content"><ul class="menu"><li class="leaf first"><a target="content" href="#someurl" title="">Login</a></li>
<li class="leaf last"><a target="content" href="#someurl" title="">Register</a></li>
</ul></div>
</div>
</div>
</body>
</html>

PS: i know that parsing HTML with regex is bad, but its not my code, i just need to fix it.

i get some HTML it a as ajax response, and i need to get just the body contents. So i made this regex:

/(<body>|<\/body>)/ig

works well in all browser but for some reason IE gives me an other array when i use split:

data.split(/(<body>|<\/body>)/ig)

In all normal browsers the content of the body is split(/(<body>|<\/body>)/ig)[2] but in ie its in split(/(<body>|<\/body>)/ig)[1]. (tested in IE7 & 8)

Why is this? And how could i modify it, in order to get the same array in all browsers?

edit just to clarify. I alrady have a solution as mentioned by tobyodavies. I want to understandy, why it behaves differently.

this is the HTML from the response: (the string in data)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "">http://www.w3/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3/1999/xhtml"  xml:lang="de"  lang="de" dir="ltr">
<head>
blablabla...
</head>
<body>
<div class="iframe">
   <div id="block-menu-menu-primary-links-user" class="block-menu">
 <h3>Primary Links - User</h3>  <div class="content"><ul class="menu"><li class="leaf first"><a target="content" href="#someurl" title="">Login</a></li>
<li class="leaf last"><a target="content" href="#someurl" title="">Register</a></li>
</ul></div>
</div>
</div>
</body>
</html>

PS: i know that parsing HTML with regex is bad, but its not my code, i just need to fix it.

Share Improve this question edited Apr 4, 2011 at 9:32 meo asked Apr 4, 2011 at 9:10 meomeo 31.3k19 gold badges89 silver badges123 bronze badges 11
  • don't use regexes to parse HTML... the <center> cannot hold, it is too late! stackoverflow./questions/1732348/… – tobyodavies Commented Apr 4, 2011 at 9:13
  • i know that its bad. Its not my code. I just need to fix it but thank you :P I wonder why the result is different – meo Commented Apr 4, 2011 at 9:14
  • In your situation an XML parser will be more appropriate than a regex. – Stephan Commented Apr 4, 2011 at 9:16
  • Is it because IE is using a 0 based array and the rest 1? – BugFinder Commented Apr 4, 2011 at 9:16
  • 1 The following page lists differences in the 'split' implementation between browsers: blog.stevenlevithan./archives/cross-browser-split - not sure if any of the items listed there apply here. – Matthew Wilson Commented Apr 4, 2011 at 9:43
 |  Show 6 more ments

4 Answers 4

Reset to default 9

The reason it behaves differently is because of the subexpression capture you have using parenthesis. Other browsers add the match inside these captures to the resulting array, IE 8 and lower do not. To get a more consistent result, you'd have to make the group non-capturing:

/(?:<body>|<\/body>)/ig

This is the reason other browsers have the content in [2] rather than [1][1] will, in theory, contain the string "<body>". The other browsers have it right on this one and Internet Explorer 9 fixed the problem by implementing the method as outlined by the ECMAScript 5th Edition specification.

There are more inconsistencies than this, though. ECMAScript 5 pliance in all browsers will resolve these differences, but you might want to take a look at Steven Levithan's blog, where he outlines the differing implementations and even provides a custom split() method as a solution to the problem.

Have you considered just using xhr.responseXML.body.innerHTML the DOM is a lot better at parsing HTML than regexes

The following page lists differences in the 'split' implementation between browsers: http://blog.stevenlevithan./archives/cross-browser-split

You can do something like this :


var body_content;
var isIE = ( (ua.indexOf("msie") != -1) && (ua.indexOf("opera") == -1) && (ua.indexOf("webtv") == -1) );
var results = data.split(/(<body>|<\/body>)/ig);

if (isIE) {
  body_content = results[1];
} else {
  body_content = results[2];
}
发布评论

评论列表(0)

  1. 暂无评论