$.trim()
uses the following RegExp to trim a string:
/^(\s|\u00A0)+|(\s|\u00A0)+$/g
As it turns out, this can be pretty ugly, Example:
var mystr = ' some test -- more text new test xxx';
mystr = mystr.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g, "");
This code hangs Firefox and Chrome, it just takes like forever. "mystr
" contains whitespaces but mostly hex 160(A0)
characters. This "problem" does only occur, if there is no prepending whitespace/A0
, but somewhere within the string. I have no clue why this happens.
This expression:
/^[\n\r\t \xA0]+|[\n\r\t \xA0]$/g
just works fine in all tested scenarios. Maybe a better pattern for that?
Source: .4.2.js
UPDATE
It looks like you can't copy&paste this example string, at some points those A0
characters are replaced. Firebug console
will also replace the characters on pasting, you have to create your own string in a sepperate html file/editor to test this.
$.trim()
uses the following RegExp to trim a string:
/^(\s|\u00A0)+|(\s|\u00A0)+$/g
As it turns out, this can be pretty ugly, Example:
var mystr = ' some test -- more text new test xxx';
mystr = mystr.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g, "");
This code hangs Firefox and Chrome, it just takes like forever. "mystr
" contains whitespaces but mostly hex 160(A0)
characters. This "problem" does only occur, if there is no prepending whitespace/A0
, but somewhere within the string. I have no clue why this happens.
This expression:
/^[\n\r\t \xA0]+|[\n\r\t \xA0]$/g
just works fine in all tested scenarios. Maybe a better pattern for that?
Source: http://code.jquery./jquery-1.4.2.js
UPDATE
It looks like you can't copy&paste this example string, at some points those A0
characters are replaced. Firebug console
will also replace the characters on pasting, you have to create your own string in a sepperate html file/editor to test this.
- 1 It seems like SO converted your A0s to 20s (at least when I cut and paste your code into Emacs). – Peter Jaric Commented Jun 28, 2010 at 13:19
- 1 @Nick: that regex is used as of 1.4.2 – Crescent Fresh Commented Jun 28, 2010 at 13:21
- 2 @All: See Peter Jaric's ment and my update – jAndy Commented Jun 28, 2010 at 13:23
- 3 There's a similar bug listed from a month ago in the jQuery bugtracker. Catastrophic backtracking, which I regularly do, but seldom in code. – Andrew Commented Jun 28, 2010 at 13:27
- 1 @jAndy, maybe you should add a link to the source to your question, to avoid misunderstandings: code.jquery./jquery-1.4.2.js – Peter Jaric Commented Jun 28, 2010 at 13:28
3 Answers
Reset to default 9This is a known bug, as said in ments, and Crescent is right that it's this way in 1.4.2, but it's already fixed for the next release.
You can test the speed of String.prototype.trim
on your string here: http://jsfiddle/dLLVN/
I get around 79ms in Chrome 117ms in Firefox for a million runs...so this will fix the hanging issue :)
As for the fix, take a look at the current source that'll be in 1.4.3, the native trimming is now used.
There were 2 mits in march for this:
- http://github./jquery/jquery/mit/141ad3c3e21e7734e67e37b5fb39782fe11b3c18
- http://github./jquery/jquery/mit/ba8938d444b9a49bdfb27213826ba108145c2e50
1.4.2 $.trim()
function:
trim: function( text ) {
return (text || "").replace( rtrim, "" );
},
1.4.3 $.trim()
function:
//earlier:
trim = String.prototype.trim
//new trim here
trim: trim ?
function( text ) {
return text == null ?
"" :
trim.call( text );
} :
// Otherwise use our own trimming functionality
function( text ) {
return text == null ?
"" :
text.toString().replace( trimLeft, "" ).replace( trimRight, "" );
}
The trimLeft
and trimRight
vary, depending on whether you're in IE or not, like this:
trimLeft = /^\s+/,
trimRight = /\s+$/,
// Verify that \s matches non-breaking spaces
// (IE fails on this test)
if ( !/\s/.test( "\xA0" ) ) {
trimLeft = /^[\s\xA0]+/;
trimRight = /[\s\xA0]+$/;
}
Normally an expression like ^\s+|\s+$
should be enough for trimming, since \s
is supposed to match all space characters, even \0xa0
non-breaking spaces1. This expression should run without causing any problems.
Now probably some browser that jQuery wants to support doesn't match \0xa0
with \s
and to work around this problem jQuery added the alternative (\s|\0xa0)
, to trim away non-breaking spaces on that browser too.
With this change, the second part of the regex looks like (\s|\0xa0)+$
, which leads to problems in browsers where \0xa0
is also matched by \s
. In a string containing a long sequence of \0xa0
characters, each character can be matched by \s
or \0xa0
, leading to lots of alternative matches and exponentially many binations how different matches can be bined. If this sequence of \0xa0
characters is not at the end of the string, the trailing $
condition can never be fulfilled, no matter which spaces are matched by \s
and which are matched by \0xax
, but the browser doesn't know this and tries all binations, potentially searching for a very long time.
The simplified expression you suggest will not be sufficient since \s
is supposed to match all unicode space characters, not just the well-known ASCII ones.
1 According to MDC, \s
is equivalent to [\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]
As it turned out, this behavior was posted on jQuerys bugtracker one month ago:
http://dev.jquery./ticket/6605
Thanks to Andrew
for pointing me to that.