最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - regex vs while loops - Stack Overflow

programmeradmin0浏览0评论

While reading this SO post - Is there a version of JavaScript's String.indexOf() that allows for regular expressions?) I ponder which of the following two functions that look for the last (largest) whitespace group in txt run faster (or do they have negligible run-time difference)

(function(str)
{   
    var result = /\s+(?!.*\s+)/.exec(str);
    return ((result)? result.index : -1);
})(txt);

or

(function(str)
{
   var regex = /\s+/g;
   var result;
   var index = -1;
   while(result = regex.exec(str))
   {
       index = result.index;
   }
   return index;
})(txt);

Briefly, the first uses a regex expression to look for a whitespace group that is not followed by any other whitespace groups, and the second uses a while loop.

Any help with this matter is much appreciated.

While reading this SO post - Is there a version of JavaScript's String.indexOf() that allows for regular expressions?) I ponder which of the following two functions that look for the last (largest) whitespace group in txt run faster (or do they have negligible run-time difference)

(function(str)
{   
    var result = /\s+(?!.*\s+)/.exec(str);
    return ((result)? result.index : -1);
})(txt);

or

(function(str)
{
   var regex = /\s+/g;
   var result;
   var index = -1;
   while(result = regex.exec(str))
   {
       index = result.index;
   }
   return index;
})(txt);

Briefly, the first uses a regex expression to look for a whitespace group that is not followed by any other whitespace groups, and the second uses a while loop.

Any help with this matter is much appreciated.

Share Improve this question edited May 23, 2017 at 12:19 CommunityBot 11 silver badge asked Jun 5, 2011 at 14:37 knightknight 3602 silver badges7 bronze badges 4
  • 1 You could always try the two approaches out and see! – Pointy Commented Jun 5, 2011 at 14:41
  • 1 Your second function is wrong. It needs to be index = result.index instead of index += result.index. – Gumbo Commented Jun 5, 2011 at 14:56
  • My approach is always to do the simplest or at least the most sensible one of the bunch and let the experts optimize the piler. Overall you could find out which method is fastest now and go with it but you would have to check all browsers and overall the languages runtime changes and is optimized all the time so what is the fastest now might be the slowest latter, so let the experts handle the optimization unless you specifically are having trouble. – Jonathon Commented Jun 5, 2011 at 16:04
  • Thanks! I modified the second function so that it is now correct. – knight Commented Jun 6, 2011 at 2:04
Add a ment  | 

2 Answers 2

Reset to default 3
(function(str)
{   
    var result = /\s+(?!.*\s+)/.exec(str);
    return ((result)? result.index : -1);
})(txt);

is broken. It will match " \n" because . does not match all space characters. Specifically it does not match the space characters "\r\n\u2028\u2029" which are matched by \s.

If you want a good way to match the last (largest) whitespace group in txt, use the RegExp below with String.prototype.search:

var indexOfStartOfLastWhitespaceGroup = str.search(/\s+\S*$/);

To get the end index, you can't use the .lastIndex property of the regular expression since it includes the \S* portion. You can use .search again though.

if (indexOfStartOfLastWhitespaceGroup >= 0) {
  var indexOfEndOfLastWhitespaceGroup = str.search(/\S*$/);
  ...
}

I ponder which of the following two functions that look for the last (largest) whitespace group in txt run faster (or do they have negligible run-time difference)

For small strings the result is likely negligible no matter what (correct) method you use. For large strings, iterating over the whole string is going to be expensive, so your best bet is to use a regular expression that is anchored at the end, i.e. has $ as the last token and does not have ^ in it. An interpreter can waste time doing a full string search when there is a right-only-anchored regex, but I believe most do this simple optimization.

This is what I get on squarefree shell under chrome.

var s = '';
for (var i = 10000; --i >= 0;) s += 'abba';
s += 'foo';
var t0 = Date.now(); for (var i = 100; --i >= 0;) /foo$/.test(s); var t1 = Date.now();
var t2 = Date.now(); for (var i = 100; --i >= 0;) /abbafoo/.test(s); var t3 = Date.now();
[t1 - t0, t3 - t2]
// emits [1, 8]

Finally, you should be aware that \s does not always mean the same thing on all interpreters. /\s/.test("\xA0") which tests whether the non-breaking space (think  ) is a space is false on IE 6 but true on most other browsers' interpreters (not sure about IE 7+).

You can use jsPerf to pare the performance of different JavaScript snippets. I’ve created one that uses your two variants and this one by me:

function(str) {
    var parts = str.split(/(?=\s+)/);
    return parts.length === 1 ? -1 : str.length - parts[parts.length-1].length;
}

It basically splits the string at the position of the match using a look-ahead assertion. If no match is found, split returns an array with just one item; otherwise the length of the last part is subtracted from the total length of the string to get the index of the last match.


Update    I’ve tweaked the functions a little bit and now we’ve got some pletely different results pared to the previous benchmark. Now the first function that is now using /\s+(?!\S+\s+)/ instead of /\s+(?!.*\s+)/ seems to be the fastest.

发布评论

评论列表(0)

  1. 暂无评论