Is it possible to cancel a regex.match
operation if takes more than 10 seconds to complete?
I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...
regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+
Working example:
So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof
Thanks.
Is it possible to cancel a regex.match
operation if takes more than 10 seconds to complete?
I'm using an huge regex to match a specific text, and sometimes may work, and sometimes can fail...
regex: MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d+)[^P]*)(?:[\s\S]*?))PÁG\s:\s+\d+\/(\d+)\b(?:\D*(?:(?!\1\/\1)\d\D*)*)\1\/\1(?:[^Z]*(?:Z(?!6:\s\d+)[^Z]*)(?:[\s\S]*?))Z6:\s+\d+
Working example: https://regex101.com/r/kU6rS5/1
So.. i want cancel the operation if takes more than 10 seconds. Is it possible? I'm not finding anything related in sof
Thanks.
Share Improve this question edited Aug 9, 2016 at 20:11 asked Aug 9, 2016 at 20:05 user6683527user6683527 9 | Show 4 more comments3 Answers
Reset to default 7You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.
fork is probably what you should use, if you go down this road.
If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:
index.js
const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;
regexProcess.on('message', function(data) {
console.log('received message from child:', data);
clearTimeout(timeout);
received = data;
regexProcess.kill(); // or however you want to end it. just as an example.
// you have access to the regex data here.
// send to a callback, or resolve a promise with the value,
// so the original calling code can access it as well.
});
const timeoutInMs = 10000;
let timeout = setTimeout(() => {
if (!received) {
console.error('regexProcess is still running!');
regexProcess.kill(); // or however you want to shut it down.
}
}, timeoutInMs);
regexProcess.send('message to match against');
regex-process.js
function respond(data) {
process.send(data);
}
function handleMessage(data) {
console.log('handing message:', data);
// run your regex calculations in here
// then respond with the data when it's done.
// the following is just to emulate
// a synchronous computational delay
for (let i = 0; i < 500000000; i++) {
// spin!
}
respond('return regex process data in here');
}
process.on('message', handleMessage);
This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.
Another solution I found here: https://www.josephkirwin.com/2016/03/12/nodejs_redos_mitigation/
Based on the use of VM, no process fork. That's pretty.
const util = require('util');
const vm = require('vm');
var sandbox = {
regex:/^(A+)*B/,
string:"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC",
result: null
};
var context = vm.createContext(sandbox);
console.log('Sandbox initialized: ' + vm.isContext(sandbox));
var script = new vm.Script('result = regex.test(string);');
try{
// One could argue if a RegExp hasn't processed in a given time.
// then, its likely it will take exponential time.
script.runInContext(context, { timeout: 1000 }); // milliseconds
} catch(e){
console.log('ReDos occurred',e); // Take some remedial action here...
}
console.log(util.inspect(sandbox)); // Check the results
I made a Node.js package specifically for this called super-regex
:
import {isMatch} from 'super-regex';
console.log(isMatch(/\d+/, getUserInput(), {timeout: 10000}));
Instead of executing in a worker or child process, it uses the Node.js vm
module to execute in a new context. This means the execution is faster and can remain synchronous.
(?:[\s\S]*?)
must be removed because they were not even meant to be there, use a*
quantifier on the non-capturing groups to correctly unroll the lazy dot matching patterns (you have 2 here), 2) the second unrolled pattern is meaningless, you may use[\s\S]*?
, 3) the final subpattern (in the negative lookaheads) quantifiers should be removed for quicker matching. – Wiktor Stribiżew Commented Aug 9, 2016 at 22:22P
andZ
and the pattern is long by itself:MINISTÉRIO(?:[^P]*(?:P(?!ÁG\s:\s\d+\/\d)[^P]*)*)PÁG\s:\s+\d+\/(\d+)\b[\s\S]*?\1\/\1[^Z]*(?:Z(?!6:\s\d)[^Z]*)*Z6:\s+\d+
. – Wiktor Stribiżew Commented Aug 9, 2016 at 22:24