I need to dependably remove all JavaScript ments with a single Regular Expression.
I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line ments, ments within strings, regular expressions, etc.
Is there any Regular expressions that can remove the ments from this:
var test = [
"// Code",
'// Code',
"'// Code",
'"// Code',
//" Comment",
//' Comment',
/* Comment */
// Comment /* Comment
/* Comment
Comment // */ "Code",
"Code",
"/* Code */",
"/* Code",
"Code */",
'/* Code */',
'/* Code',
'Code */',
/* Comment
"Comment",
Comment */ "Code",
/Code\/*/,
"Code */"
]
Here's a jsbin or jsfiddle to test it.
I need to dependably remove all JavaScript ments with a single Regular Expression.
I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line ments, ments within strings, regular expressions, etc.
Is there any Regular expressions that can remove the ments from this:
var test = [
"// Code",
'// Code',
"'// Code",
'"// Code',
//" Comment",
//' Comment',
/* Comment */
// Comment /* Comment
/* Comment
Comment // */ "Code",
"Code",
"/* Code */",
"/* Code",
"Code */",
'/* Code */',
'/* Code',
'Code */',
/* Comment
"Comment",
Comment */ "Code",
/Code\/*/,
"Code */"
]
Here's a jsbin or jsfiddle to test it.
Share Improve this question asked Jul 1, 2014 at 19:38 wizuluswizulus 6,3732 gold badges25 silver badges41 bronze badges 6- 2 Do you? Why? What's the context of this requirement? Have you made an attempt? – David Thomas Commented Jul 1, 2014 at 19:40
- Have you made any attempts at creating the said regular expression? If so, post it here. But note, however, that this task may not be easy to achieve with just regular expressions. The best course of action is to use a real JavaScript parser. – Amal Commented Jul 1, 2014 at 19:41
- Best regexp I've managed to find so far: /(?:\/*(?:[\s\S]*?)*\/)|(?:([\s;])+\/\/(?:.*)$)/gm – wizulus Commented Jul 1, 2014 at 19:41
- 3 +1 for the high-quality fiddle – Lucas Trzesniewski Commented Jul 1, 2014 at 19:42
-
2
... but
/Comment/gm
works :P – Lucas Trzesniewski Commented Jul 1, 2014 at 19:45
5 Answers
Reset to default 8I like challenges :)
Here's my working solution:
/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm
Replace that with $1
.
Fiddle here: http://jsfiddle/LucasTrz/DtGq8/6/
Of course, as it has been pointed out countless times, a proper parser would probably be better, but still...
NB: I used a regex literal in the fiddle insted of a regex string, too much escaping can destroy your brain.
Breakdown
((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$ <-- line ments
|\/\*[\s\S]*?\*\/ <-- inline ments
The part to keep
(["'])(?:\\[\s\S]|.)*?\2 <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/ <-- regex literals
Strings
["'] match a quote and capture it
(?:\\[\s\S]|.)*? match escaped characters or unescpaed characters, don't capture
\2 match the same type of quote as the one that opened the string
Regex literals
\/ match a forward slash
(?![*\/]) ... not followed by a * or / (that would start a ment)
(?:\\.|\[(?:\\.|.)\]|.)*? match any sequence of escaped/unescaped text, or a regex character class
\/ ... until the closing slash
The part to remove
|\/\/.*?$ <-- line ments
|\/\*[\s\S]*?\*\/ <-- inline ments
Line ments
\/\/ match two forward slashes
.*?$ then everything until the end of the line
Inline ments
\/\* match /*
[\s\S]*? then as few as possible of anything, see note below
\*\/ match */
I had to use [\s\S]
instead of .
because unfortunately JavaScript doesn't support the regex s
modifier (singleline - this one allows .
to match newlines as well)
This regex will work in the following corner cases:
- Regex patterns containing
/
in character classes:/[/]/
- Escaped newlines in string literals
Final boss fight
And just for the fun of it... here's the eye-bleeding hardcore version:
/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm
This adds the following twisted edge case (fiddle, regex101):
Code = /* Comment */ /Code regex/g ; // Comment
Code = Code / Code /* Comment */ /g ; // Comment
Code = /Code regex/g /* Comment */ ; // Comment
This is highly heuristical code, you probably shouldn't use it (even less so than the previous regex) and just let that edge case blow.
First off, I suggest doing this with a proper JavaScript parser instead. Checkout this previous Q&A: JavaScript parser in JavaScript
For the input you've provided1, here is a solution that might work:
Match the pattern:
/("(?:[^\r\n\\"]|\\.)*"|'(?:[^\r\n\\']|\\.)*'|\/[^*\/]([^\\\/]|\\.)*\/[gm]*)|\/\/[^\r\n]*|\/\*[\s\S]*?\*\//g
Here's a break down of the pattern:
/
( # start match group 1
"(?:[^\r\n\\"]|\\.)*" # match a double quoted string
| '(?:[^\r\n\\']|\\.)*' # match a single quoted string
| \/[^*\/]([^\\\/]|\\.)*\/[gm]* # match a regex literal
) # end match group 1
| \/\/[^\r\n]* # match a single line break
| \/\*[\s\S]*?\*\/ # match a multi-line break
/g
and replace it with $1
(match group 1). The trick here is that anything besides a ment is matched in group 1, which get replaced with itself again but ments get replaced with an empty string.
Here's a regexr demo that shows the following replacement:
var test = [
"// Code",
'// Code',
"'// Code",
'"// Code',
"Code",
"Code",
"/* Code */",
"/* Code",
"Code */",
'/* Code */',
'/* Code',
'Code */',
"Code",
/Code\/*/,
"Code */"
]
1 Again, a parser is the way to go since regex literals might be confused with the division operator. If you have an assignment like var x = a / b / g;
in your source, the solution above will break!
I suggest you look at parsing JavaScript using a JavaScript parser of itself and then leverage the parser API to strip out what you don't want. I have not personally done this, but regular expressions should be limited to regular content, which I doubt JS falls into.
Here are some good places to look.
JavaScript parser in JavaScript
test.replace(/(/*([\s\S]?)*/)|(//(.)$)/gm, '');
Is there any Regular expressions that can remove the ments
No. You cannot build a regex that will match a ment (so that you simply can replace the match with the empty string), because without lookbehind it is impossible to determine whether //"
is a ment or the end of a string literal.
You could use a regex as a tokenizer (you "only" need to take care of string literals, regex literals, and the two types of ments), but I'd remend to use a full-blown JavaScript parser, they are freely available.