最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript regex to select quoted string but not escape quotes - Stack Overflow

programmeradmin2浏览0评论

Original string:

some text "some \"string\"right here "

Want to get:

"some \"string\"right here"

I am using the following regex:

/\"(.*?)\"/g

Original string:

some text "some \"string\"right here "

Want to get:

"some \"string\"right here"

I am using the following regex:

/\"(.*?)\"/g
Share Improve this question edited Jul 25, 2016 at 9:05 Wiktor Stribiżew 628k41 gold badges498 silver badges616 bronze badges asked Jul 25, 2016 at 8:55 user6634092user6634092 1
  • I would do a preliminary pass over the string replacing \" with some "magic" string such as ESCAPED_QUOTE, then find the things inside quotes, then change the magic string back to escaped quotes. Or, you could write an impenetrable, ununderstandable, unreadable, unmaintainable regxp with dozens of backslashes Your choice. – user663031 Commented Jul 25, 2016 at 12:12
Add a ment  | 

5 Answers 5

Reset to default 3

Parsing the string correctly with a parser

With a JavaScript regex, it is impossible to start matching at the correct double quote. You will either match an escaped one, or you will fail to match the correct double quote after a literal \ before a quote. Thus, the safest way is to use a parser. Here is a sample one:

var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
console.log("Incorrect (with regex): ", s.match(/"([^"\\]*(?:\\.[^"\\]*)*)"/g));
var res = [];
var tmp = "";
var in_quotes = false;
var in_entity = false;
for (var i=0; i<s.length; i++) {
  if (s[i] === '\\' && in_entity  === false) { 
     in_entity = true;
     if (in_quotes === true) {
       tmp += s[i];
     }
  } else if (in_entity === true) { // add a match
      in_entity = false;
      if (in_quotes === true) {
         tmp += s[i];
      }
  } else if (s[i] === '"' && in_quotes === false) { // start a new match
      in_quotes = true;
      tmp += s[i];
  } else if (s[i] === '"'  && in_quotes === true) { // append char to match and add to results
      tmp += s[i];
      res.push(tmp);
      tmp = "";
      in_quotes = false;
  } else if (in_quotes === true) { // append a char to the match
     tmp += s[i];
  } 
}
console.log("Correct results: ", res);

Not-so-safe regex approach

It is not possible to match the string you need with lazy dot matching pattern since it will stop before the first ". If you know your string will never have an escaped quote before a quoted substring, and if you are sure there are no literal \ before double quotes (and these conditions are very strict to use the regex safely), you can use

/"([^"\\]*(?:\\.[^"\\]*)*)"/g

See the regex demo

  • " - match a quote
  • ([^"\\]*(?:\\.[^"\\]*)*) - 0 or more sequences of
    • [^"\\]* - 0+ non-\ and non"s
    • (?:\\.[^"\\]*)* - zero or more sequences of
      • \\. - any escaped symbol
      • [^"\\]* - 0+ non-\ and non"s
  • " - trailing quote

JS demo:

var re = /"([^"\\]*(?:\\.[^"\\]*)*)"/g; 
var str = `some text "some \\"string\\"right here " some text "another \\"string\\"right here "`;
var res = [];
while ((m = re.exec(str)) !== null) {
   res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>"; // Just for demo
console.log(res); // or another result demo

Safe regex approach

Complementing @WiktorStribiżew's answer, there is a technique to start matching at the correct double quote using regex. It consists of matching both quoted and unquoted text in the form:

/"(quoted)"|unquoted/g

As you can see, the quoted text is matched by a group, so we'll only consider text backreferenced by match[1].

Regex

/"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g

Code

var regex = /"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g;
var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
var match;
var res = [];

while ((match = regex.exec(s)) !== null) {
    if (match.index === regex.lastIndex)
        regex.lastIndex++;

    if( match[1] != null )
        res.push(match[1]); //Append to result only group 1
}

console.log("Correct results (regex technique): ",res)

Universal solution:

  • quote types: single, double, backticks
  • detect each quoted part and quote type
  • allows escaped quotes to be inside quoted parts
  • results in two groups: <qType> (quote type), <inQuotes>

(?<qType>["'`])(?<inQuotes>(?:\\\1|.)*?)\1

or, without group naming:

(["'`])((?:\\\1|.)*?)\1

You can use this regex :

/[^\\](\".*?[^\\]\")/g

[^\\] catch any caracter diferent of \. So \" will not be catch as start or end of your match.

In order to match from quote to quote while ignoring any simple escaped quotes (\"):

(:?[^\\]|^)(\"(:?.*?[^\\]){0,1}\")

Meaning (:? start of grouping with no extraction [^\\] match one char that is not a backslash | match the previous char or ^ which is beginning of string. ( start of extraction grouping \" find quotes (that follow non slash or start of string), (:?.*?[^\\] match shortest substring ending with none slash, ){0,1} zero times or one - that actually means one time or an empty substring, that is followed by \" a quote mark.

Edit: Wiktor Stribiżew Correctly pointed out that some more cases with regex terms in the string will fail in my initial answer. for example \\" that should be matched similar to " in your case. To avoid this specific issue you can use

(:?[^\\]|^)((:?\\\\)*\"(:?.*?[^\\]){0,1}(:?\\\\)*\")

But for actual regex patibility you will need to refer to Wiktor's answer.

发布评论

评论列表(0)

  1. 暂无评论