最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - Comprehensive RegExp to remove JavaScript comments - Stack Overflow

programmeradmin0浏览0评论

I need to dependably remove all JavaScript ments with a single Regular Expression.

I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line ments, ments within strings, regular expressions, etc.

Is there any Regular expressions that can remove the ments from this:

var test = [
    "// Code",
    '// Code',
    "'// Code",
    '"// Code',
    //" Comment",
    //' Comment',
    /* Comment */
    // Comment /* Comment
    /* Comment
     Comment // */ "Code",
    "Code",
    "/* Code */",
    "/* Code",
    "Code */",
    '/* Code */',
    '/* Code',
    'Code */',
    /* Comment
    "Comment",
    Comment */ "Code",
    /Code\/*/,
    "Code */"
]

Here's a jsbin or jsfiddle to test it.

I need to dependably remove all JavaScript ments with a single Regular Expression.

I have searched StackOverflow, and other sites, but none take into account alternating quotes, multi-line ments, ments within strings, regular expressions, etc.

Is there any Regular expressions that can remove the ments from this:

var test = [
    "// Code",
    '// Code',
    "'// Code",
    '"// Code',
    //" Comment",
    //' Comment',
    /* Comment */
    // Comment /* Comment
    /* Comment
     Comment // */ "Code",
    "Code",
    "/* Code */",
    "/* Code",
    "Code */",
    '/* Code */',
    '/* Code',
    'Code */',
    /* Comment
    "Comment",
    Comment */ "Code",
    /Code\/*/,
    "Code */"
]

Here's a jsbin or jsfiddle to test it.

Share Improve this question asked Jul 1, 2014 at 19:38 wizuluswizulus 6,3732 gold badges25 silver badges41 bronze badges 6
  • 2 Do you? Why? What's the context of this requirement? Have you made an attempt? – David Thomas Commented Jul 1, 2014 at 19:40
  • Have you made any attempts at creating the said regular expression? If so, post it here. But note, however, that this task may not be easy to achieve with just regular expressions. The best course of action is to use a real JavaScript parser. – Amal Commented Jul 1, 2014 at 19:41
  • Best regexp I've managed to find so far: /(?:\/*(?:[\s\S]*?)*\/)|(?:([\s;])+\/\/(?:.*)$)/gm – wizulus Commented Jul 1, 2014 at 19:41
  • 3 +1 for the high-quality fiddle – Lucas Trzesniewski Commented Jul 1, 2014 at 19:42
  • 2 ... but /Comment/gm works :P – Lucas Trzesniewski Commented Jul 1, 2014 at 19:45
 |  Show 1 more ment

5 Answers 5

Reset to default 8

I like challenges :)

Here's my working solution:

/((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/)|\/\/.*?$|\/\*[\s\S]*?\*\//gm

Replace that with $1.

Fiddle here: http://jsfiddle/LucasTrz/DtGq8/6/

Of course, as it has been pointed out countless times, a proper parser would probably be better, but still...

NB: I used a regex literal in the fiddle insted of a regex string, too much escaping can destroy your brain.


Breakdown

((["'])(?:\\[\s\S]|.)*?\2|\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/) <-- the part to keep
|\/\/.*?$                                                         <-- line ments
|\/\*[\s\S]*?\*\/                                                 <-- inline ments

The part to keep

(["'])(?:\\[\s\S]|.)*?\2                   <-- strings
\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/     <-- regex literals

Strings

    ["']              match a quote and capture it
    (?:\\[\s\S]|.)*?  match escaped characters or unescpaed characters, don't capture
    \2                match the same type of quote as the one that opened the string

Regex literals

    \/                          match a forward slash
    (?![*\/])                   ... not followed by a * or / (that would start a ment)
    (?:\\.|\[(?:\\.|.)\]|.)*?   match any sequence of escaped/unescaped text, or a regex character class
    \/                          ... until the closing slash

The part to remove

|\/\/.*?$              <-- line ments
|\/\*[\s\S]*?\*\/      <-- inline ments

Line ments

    \/\/         match two forward slashes
    .*?$         then everything until the end of the line

Inline ments

    \/\*         match /*
    [\s\S]*?     then as few as possible of anything, see note below
    \*\/         match */

I had to use [\s\S] instead of . because unfortunately JavaScript doesn't support the regex s modifier (singleline - this one allows . to match newlines as well)

This regex will work in the following corner cases:

  • Regex patterns containing / in character classes: /[/]/
  • Escaped newlines in string literals

Final boss fight

And just for the fun of it... here's the eye-bleeding hardcore version:

/((["'])(?:\\[\s\S]|.)*?\2|(?:[^\w\s]|^)\s*\/(?![*\/])(?:\\.|\[(?:\\.|.)\]|.)*?\/(?=[gmiy]{0,4}\s*(?![*\/])(?:\W|$)))|\/\/.*?$|\/\*[\s\S]*?\*\//gm

This adds the following twisted edge case (fiddle, regex101):

Code = /* Comment */ /Code regex/g  ; // Comment
Code = Code / Code /* Comment */ /g  ; // Comment    
Code = /Code regex/g /* Comment */  ; // Comment

This is highly heuristical code, you probably shouldn't use it (even less so than the previous regex) and just let that edge case blow.

First off, I suggest doing this with a proper JavaScript parser instead. Checkout this previous Q&A: JavaScript parser in JavaScript

For the input you've provided1, here is a solution that might work:

Match the pattern:

/("(?:[^\r\n\\"]|\\.)*"|'(?:[^\r\n\\']|\\.)*'|\/[^*\/]([^\\\/]|\\.)*\/[gm]*)|\/\/[^\r\n]*|\/\*[\s\S]*?\*\//g

Here's a break down of the pattern:

/
  (                                     # start match group 1
      "(?:[^\r\n\\"]|\\.)*"             #   match a double quoted string
    | '(?:[^\r\n\\']|\\.)*'             #   match a single quoted string
    | \/[^*\/]([^\\\/]|\\.)*\/[gm]*     #   match a regex literal
  )                                     # end match group 1
  | \/\/[^\r\n]*                        # match a single line break
  | \/\*[\s\S]*?\*\/                    # match a multi-line break
/g

and replace it with $1 (match group 1). The trick here is that anything besides a ment is matched in group 1, which get replaced with itself again but ments get replaced with an empty string.

Here's a regexr demo that shows the following replacement:

  var test = [
      "// Code",
      '// Code',
      "'// Code",
      '"// Code',




       "Code",
      "Code",
      "/* Code */",
      "/* Code",
      "Code */",
      '/* Code */',
      '/* Code',
      'Code */',
       "Code",
      /Code\/*/,
      "Code */"
  ]

1 Again, a parser is the way to go since regex literals might be confused with the division operator. If you have an assignment like var x = a / b / g; in your source, the solution above will break!

I suggest you look at parsing JavaScript using a JavaScript parser of itself and then leverage the parser API to strip out what you don't want. I have not personally done this, but regular expressions should be limited to regular content, which I doubt JS falls into.

Here are some good places to look.

JavaScript parser in JavaScript

test.replace(/(/*([\s\S]?)*/)|(//(.)$)/gm, '');

Is there any Regular expressions that can remove the ments

No. You cannot build a regex that will match a ment (so that you simply can replace the match with the empty string), because without lookbehind it is impossible to determine whether //" is a ment or the end of a string literal.

You could use a regex as a tokenizer (you "only" need to take care of string literals, regex literals, and the two types of ments), but I'd remend to use a full-blown JavaScript parser, they are freely available.

发布评论

评论列表(0)

  1. 暂无评论