最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Cancel Regex match if timeout - Stack Overflow

programmeradmin10浏览0评论

Is it possible to cancel a regex.match operation if takes more than 10 seconds to complete?

I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...

regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+

Working example:

So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof

Thanks.

Is it possible to cancel a regex.match operation if takes more than 10 seconds to complete?

I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...

regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+

Working example: https://regex101.com/r/kU6rS5/1

So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof

Thanks.

Share Improve this question edited Aug 9, 2016 at 20:11 asked Aug 9, 2016 at 20:05 user6683527user6683527 9
  • 5 Ummmmmm... what the heck are you trying to match here? – Sebastian Lenartowicz Commented Aug 9, 2016 at 20:12
  • 1 On regex101 it says: "The script has halted execution as it exceeded a maximum execution time of 2s. This would likely occur when your expression results in what is known as catastrophic backtracking. I have halted the execution for you and will resume it after you have modified your expression or match string." regular-expressions.info/catastrophic.html - You cannot halt a regex.match if it takes too much time, I think you need reevaluate your regular expression. – Rob M. Commented Aug 9, 2016 at 20:12
  • Hm, that's why is working in my application. But, still... is taking 3 minutes to complete.. i want cancel to avoid blocking my server... – user6683527 Commented Aug 9, 2016 at 20:14
  • 1 The point is that the expression above is poorly written: 1) the (?:[\s\S]*?) must be removed because they were not even meant to be there, use a * quantifier on the non-capturing groups to correctly unroll the lazy dot matching patterns (you have 2 here), 2) the second unrolled pattern is meaningless, you may use [\s\S]*?, 3) the final subpattern (in the negative lookaheads) quantifiers should be removed for quicker matching. – Wiktor Stribiżew Commented Aug 9, 2016 at 22:22
  • 1 Here is the correct regex that will still be slow since the input is huge and has a lot of P and Z and the pattern is long by itself: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d)[^P]*)*)PÁG\s:\s+\d+\/(\d+)\b[\s\S]*?\1\/\1[^Z]*(?:Z(?!6:\s\d)[^Z]*)*Z6:\s+\d+. – Wiktor Stribiżew Commented Aug 9, 2016 at 22:24
 |  Show 4 more comments

3 Answers 3

Reset to default 7

You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.

fork is probably what you should use, if you go down this road.

If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:

index.js

const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;

regexProcess.on('message', function(data) {
  console.log('received message from child:', data);
  clearTimeout(timeout);
  received = data;
  regexProcess.kill(); // or however you want to end it. just as an example.
  // you have access to the regex data here.
  // send to a callback, or resolve a promise with the value,
  // so the original calling code can access it as well.
});

const timeoutInMs = 10000;
let timeout = setTimeout(() => {
  if (!received) {
    console.error('regexProcess is still running!');
    regexProcess.kill(); // or however you want to shut it down.
  }
}, timeoutInMs);

regexProcess.send('message to match against');

regex-process.js

function respond(data) {
  process.send(data);
}

function handleMessage(data) {
  console.log('handing message:', data);
  // run your regex calculations in here
  // then respond with the data when it's done.

  // the following is just to emulate
  // a synchronous computational delay
  for (let i = 0; i < 500000000; i++) {
    // spin!
  }
  respond('return regex process data in here');
}

process.on('message', handleMessage);

This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.

Another solution I found here: https://www.josephkirwin.com/2016/03/12/nodejs_redos_mitigation/

Based on the use of VM, no process fork. That's pretty.

    const util = require('util');
    const vm = require('vm');

    var sandbox = {
        regex:/^(A+)*B/,
        string:"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC",
        result: null
    };

    var context = vm.createContext(sandbox);
    console.log('Sandbox initialized: ' + vm.isContext(sandbox));
    var script = new vm.Script('result = regex.test(string);');
    try{
        // One could argue if a RegExp hasn't processed in a given time.
        // then, its likely it will take exponential time.
        script.runInContext(context, { timeout: 1000 }); // milliseconds
    } catch(e){
        console.log('ReDos occurred',e); // Take some remedial action here...
    }

    console.log(util.inspect(sandbox)); // Check the results

I made a Node.js package specifically for this called super-regex:

import {isMatch} from 'super-regex';

console.log(isMatch(/\d+/, getUserInput(), {timeout: 10000}));

Instead of executing in a worker or child process, it uses the Node.js vm module to execute in a new context. This means the execution is faster and can remain synchronous.

发布评论

评论列表(0)

  1. 暂无评论