I'm writing an app that lets a user specify a regular expression. Of course, users make mistakes, so I need a way to handle regular expressions that are unparseable, and give the user some actionable advice on how to fix the problem.
The problem I'm having is that the exceptions thrown by new RegExp("something awful")
are not helpful for regex n00bs, and have different messages per browser. For example:
Given:
try{
new RegExp("(pie");
}catch(e){
console.log(e.message);
}
- Firefox throws "unterminated parenthetical".
- Safari throws "missing )"
- Chrome throws "Unterminated group"
And it wouldn't surprise me if those message strings are user-language-localized, or that they've drifted over time, making this a crazy knot to untie with exception.message.
My goal is to catch the exception, figure out what it's really about, and put up a much more beginner-friendly message. (And eventually highlighting the unmatched paren, in this example.)
Is there some other exception identifier I should be using? Is there a better way to tell these apart? Failing all of that, has anyone just collected what all these strings are across the several most popular browsers?
I'm writing an app that lets a user specify a regular expression. Of course, users make mistakes, so I need a way to handle regular expressions that are unparseable, and give the user some actionable advice on how to fix the problem.
The problem I'm having is that the exceptions thrown by new RegExp("something awful")
are not helpful for regex n00bs, and have different messages per browser. For example:
Given:
try{
new RegExp("(pie");
}catch(e){
console.log(e.message);
}
- Firefox throws "unterminated parenthetical".
- Safari throws "missing )"
- Chrome throws "Unterminated group"
And it wouldn't surprise me if those message strings are user-language-localized, or that they've drifted over time, making this a crazy knot to untie with exception.message.
My goal is to catch the exception, figure out what it's really about, and put up a much more beginner-friendly message. (And eventually highlighting the unmatched paren, in this example.)
Is there some other exception identifier I should be using? Is there a better way to tell these apart? Failing all of that, has anyone just collected what all these strings are across the several most popular browsers?
Share Improve this question asked Nov 29, 2012 at 21:25 Jeremy WadhamsJeremy Wadhams 1,82219 silver badges27 bronze badges 3- 1 I would look to see what some of the popular regex online testing pages did – mplungjan Commented Nov 29, 2012 at 21:27
-
1
Does the regex
(abcd})
have one too few braces or one too many? – user1726343 Commented Nov 29, 2012 at 21:28 - And here is a trick to get hold of most possible messages. Write a script that contains a few valid but really plicated regular expressions. Really using and abusing all regex features available in JavaScript. And nesting them and everything of course. Then randomly remove, add or change a few characters in those and try to pile them. And save all the error messages you get (along with the regex that caused it). Due to the randomness you should be able to try out a lot of failing cases, and thanks to automating it you don't have to worry about duplicates. – Martin Ender Commented Nov 29, 2012 at 21:31
3 Answers
Reset to default 3Idea: Figure it all out at runtime. E.g.
var tellMeWhatIDidWrong = (function() {
var tests = {
'(': 'You did not close your group... duh!',
')': 'You seem to have an unmatched parenthesis.',
'*': 'That token is illegal in that position'
};
var errors = {};
for (var i in tests) {
try { RegExp(i); } catch(e) {
errors[String(e).split(':').pop()] = tests[i];
}
}
return function(regexStr) {
try { RegExp(regexStr); } catch(e) {
e = String(e).split(':').pop();
if (e in errors) {
return errors[e];
}
return 'Unknown error';
}
return 'Nothing -- it is fine!';
};
}());
tellMeWhatIDidWrong('(abc?'); // -> "You did not close your group... duh!"
Of course, this will only work well if a browser's in-built error reporting is specific enough. Many of them suck. E.g. Opera gives absolutely no hint as to the issue, so the above won't work well, and neither will any other solution relying on Opera's native error messages.
I would suggest sending regexps off to an app running node.js and getting the nice V8 error messages :)
Use PEG.js or JISON to create a regular expression parser. You'll be able to get specific and consistent errors.
This file has a YACC grammar for a regular expression: http://swtch./usr/local/plan9/src/cmd/grep/grep.y; it might not be too hard to use it with JISON.
A BNF grammar for PERL regex: http://www.cs.sfu.ca/~cameron/Teaching/384/99-3/regexp-plg.html
Following from my ment, I have hacked together a little script to "harvest" the possible error messages and the patterns that cause them.
JSFiddle (tried on Chrome only, I hope the RegExp exception objects have the same structure for other browsers)
The idea is this: You have a working regular expression that uses as many regex features as possible. Then you randomly mutate it (adding, removing or swapping out characters) and try to pile it. You can do this a few thousand times, and collect all the error messages. Hopefully chance is better at ing up with possible malformed patterns than anyone of us is.
You should definitely improve the base pattern, to include all regex features provided by JavaScript and include all meta characters in the replacement table. But otherwise, I seem to consistently get 6 possible error messages:
Unterminated group
Invalid group
Nothing to repeat
Unmatched ')'
Unterminated character class
\ at end of pattern
Try running this script in different browser, analyze the patterns that caused the errors, and from there you should be able to write your tool.
EDIT:
Okay, as I feared this does not work in other browsers out of the box, because they store the actual message somewhere else inside the exception object. But judging from your question you already seem to have figured out, where to get the message from for every browser, so the changes you need to make should be minor, I hope.