最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - WebWorker calculates slow regexp matches significantly slower (3x) - firefox only - Stack Overflow

programmeradmin2浏览0评论

First I just created myself a regular expression that will match all unique external library paths in a list of all header files in a project. I asked a question regarding making that regexp a week ago.

I started meddling around to see how it would behave when asynchronous and when turned into a web worker. For convenience and reliability I created this universal file that runs in all three modes:

/** Will call result() callback with every match it founds. Asynchronous unless called 
 *  with interval = -1.
 *  Javadoc style ment for Arnold Rimmer and other Java programmers:
 *  
 * @param regex regular expression to match in string
 * @param string guess what
 * @param result callback function that accepts one parameter, string match
 * @param done callback on finish, has no parameters
 * @param interval delay (not actual interval) between finding matches. If -1, 
 *        function  will be blocking
 * @property working false if loop isn't running, otherwise contains timeout ID
 *           for use with clearTimeout
 * @property done copy of done parameter
 * @throws heavy boulders
**/
function processRegex(regex, string, result, done, interval) {
  var m;
  //Please tell me interpreter optimizes this
  interval = typeof interval!='number'?1:interval;
  //And this
  processRegex.done = done;
  while ((m = regex.exec(string))) {
    Array.prototype.splice.call(m,0,1);
    var path = m.join("");
    //It's good to keep in mind that result() slows down the process
    result(path);
    if (interval>=0) {
      processRegex.working = setTimeout(processRegex, 
                              interval, regex, string, 
                              result, done, interval);
      // Comment these out for maximum speed
      processRegex.progress = regex.lastIndex/string.length;
      console.log("Progress: "+Math.round(processRegex.progress*100)+"%");
      return;
    }
  }

  processRegex.working = false;
  processRegex.done = null;
  if (typeof done=="function")
    done();
}
processRegex.working = false; 

I created a test file, rather than pasting it here I uploaded it on very reliable web hosting: Demo - Test data.

What I find very surprising is that there is such a significant difference between web worker and browser execution of RegExp. The results I got:

  • Mozilla Firefox
    • [WORKER]: Time elapsed:16.860s
    • [WORKER-SYNC]: Time elapsed:16.739s
    • [TIMEOUT]: Time elapsed:5.186s
    • [LOOP]: Time elapsed:5.028s

You can also see that with my particular regular expression, the difference between a synchronous and an asynchronous loop is insignificant. I tried to use a match list instead of a lookahead expression and the results changed a lot. Here are the changes to the old function:

function processRegexUnique(regex, string, result, done, interval) {
  var matchList = arguments[5]||[];
  ... same as before ...
  while ((m = regex.exec(string))) {
    ... same as before ...
    if (matchList.indexOf(path)==-1) {
      result(path);
      matchList.push(path);
    }
    if (interval>=0) {
      processRegex.working = setTimeout(processRegex, interval, 
                               regex, string, result, 
                               done, interval, matchList);
      ... same as before ...
    }
  }
  ... same as before ...
}

And the results:

  • Mozilla Firefox
    • [WORKER]: Time elapsed:0.062s
    • [WORKER-SYNC]: Time elapsed:0.023s
    • [TIMEOUT]: Time elapsed:12.250s (note to self: it's getting weirder every minute)
    • [LOOP]: Time elapsed:0.006s

Can anyone explain such a difference in speed?

First I just created myself a regular expression that will match all unique external library paths in a list of all header files in a project. I asked a question regarding making that regexp a week ago.

I started meddling around to see how it would behave when asynchronous and when turned into a web worker. For convenience and reliability I created this universal file that runs in all three modes:

/** Will call result() callback with every match it founds. Asynchronous unless called 
 *  with interval = -1.
 *  Javadoc style ment for Arnold Rimmer and other Java programmers:
 *  
 * @param regex regular expression to match in string
 * @param string guess what
 * @param result callback function that accepts one parameter, string match
 * @param done callback on finish, has no parameters
 * @param interval delay (not actual interval) between finding matches. If -1, 
 *        function  will be blocking
 * @property working false if loop isn't running, otherwise contains timeout ID
 *           for use with clearTimeout
 * @property done copy of done parameter
 * @throws heavy boulders
**/
function processRegex(regex, string, result, done, interval) {
  var m;
  //Please tell me interpreter optimizes this
  interval = typeof interval!='number'?1:interval;
  //And this
  processRegex.done = done;
  while ((m = regex.exec(string))) {
    Array.prototype.splice.call(m,0,1);
    var path = m.join("");
    //It's good to keep in mind that result() slows down the process
    result(path);
    if (interval>=0) {
      processRegex.working = setTimeout(processRegex, 
                              interval, regex, string, 
                              result, done, interval);
      // Comment these out for maximum speed
      processRegex.progress = regex.lastIndex/string.length;
      console.log("Progress: "+Math.round(processRegex.progress*100)+"%");
      return;
    }
  }

  processRegex.working = false;
  processRegex.done = null;
  if (typeof done=="function")
    done();
}
processRegex.working = false; 

I created a test file, rather than pasting it here I uploaded it on very reliable web hosting: Demo - Test data.

What I find very surprising is that there is such a significant difference between web worker and browser execution of RegExp. The results I got:

  • Mozilla Firefox
    • [WORKER]: Time elapsed:16.860s
    • [WORKER-SYNC]: Time elapsed:16.739s
    • [TIMEOUT]: Time elapsed:5.186s
    • [LOOP]: Time elapsed:5.028s

You can also see that with my particular regular expression, the difference between a synchronous and an asynchronous loop is insignificant. I tried to use a match list instead of a lookahead expression and the results changed a lot. Here are the changes to the old function:

function processRegexUnique(regex, string, result, done, interval) {
  var matchList = arguments[5]||[];
  ... same as before ...
  while ((m = regex.exec(string))) {
    ... same as before ...
    if (matchList.indexOf(path)==-1) {
      result(path);
      matchList.push(path);
    }
    if (interval>=0) {
      processRegex.working = setTimeout(processRegex, interval, 
                               regex, string, result, 
                               done, interval, matchList);
      ... same as before ...
    }
  }
  ... same as before ...
}

And the results:

  • Mozilla Firefox
    • [WORKER]: Time elapsed:0.062s
    • [WORKER-SYNC]: Time elapsed:0.023s
    • [TIMEOUT]: Time elapsed:12.250s (note to self: it's getting weirder every minute)
    • [LOOP]: Time elapsed:0.006s

Can anyone explain such a difference in speed?

Share Improve this question edited May 23, 2017 at 12:16 CommunityBot 11 silver badge asked Oct 2, 2015 at 15:00 Tomáš ZatoTomáš Zato 53.5k63 gold badges310 silver badges827 bronze badges 19
  • 6 If you’ve filed a Firefox bug for this, can you please add the bug URL to your question? And if you’ve not yet filed a Firefox bug for it, I hope you can please consider taking time to do that. – sideshowbarker Commented Oct 10, 2015 at 1:12
  • @sideshowbarker I googled where to report firefox bugs and I failed. So I filled plaint "Can't find where to report bugs" on firefox input ("Firefox made me sad.") and gave up. If you know where to report bugs (and it's actual report procedure, not some sink for user feedback), please tell me. This wouldn't be the first time I found problem I could reliably reproduce and identify as firefox-only. – Tomáš Zato Commented Oct 10, 2015 at 16:11
  • 1 Yeah agreed they don’t make it as clear as it could be. Anyway, for this particular bug, please use bugzilla.mozilla/… That will raise it against the appropriate DOM: Workers bugzilla ponent in the appropriate bugzilla Core product. – sideshowbarker Commented Oct 10, 2015 at 21:18
  • 1 To try help other people avoid the same frustrations you ran into with trying to figure out where to report Firefox browser-engine bugs, I created stackoverflow./questions/33059442/… If you think it’s useful to have that info on record here in StackOverflow, please consider upvoting it (otherwise it may be at risk of getting deleted if other kneejerk close-all-the-things downvoters jump on the bandwagon). – sideshowbarker Commented Oct 10, 2015 at 22:30
  • 1 The pattern is slow on purpose. Much more efficient way to do is to skip the lookaheads and use refference array instead. But this question really isn't about writing optimal code. – Tomáš Zato Commented Nov 17, 2015 at 15:29
 |  Show 14 more ments

1 Answer 1

Reset to default 2

After a series of tests, I confirmed that this is a Mozilla Firefox issue (it affects all windows desktop versions I tried). With Google Chrome, Opera, or even Firefox mobile, the regexp matches take about the same, worker or not.

If you need this issue fixed, be sure to vote on bug report on bugzilla. I will try to add additional information if anything changes.

发布评论

评论列表(0)

  1. 暂无评论