Original string:
some text "some \"string\"right here "
Want to get:
"some \"string\"right here"
I am using the following regex:
/\"(.*?)\"/g
Original string:
some text "some \"string\"right here "
Want to get:
"some \"string\"right here"
I am using the following regex:
/\"(.*?)\"/g
Share
Improve this question
edited Jul 25, 2016 at 9:05
Wiktor Stribiżew
628k41 gold badges498 silver badges616 bronze badges
asked Jul 25, 2016 at 8:55
user6634092user6634092
1
-
I would do a preliminary pass over the string replacing
\"
with some "magic" string such asESCAPED_QUOTE
, then find the things inside quotes, then change the magic string back to escaped quotes. Or, you could write an impenetrable, ununderstandable, unreadable, unmaintainable regxp with dozens of backslashes Your choice. – user663031 Commented Jul 25, 2016 at 12:12
5 Answers
Reset to default 3Parsing the string correctly with a parser
With a JavaScript regex, it is impossible to start matching at the correct double quote. You will either match an escaped one, or you will fail to match the correct double quote after a literal \
before a quote. Thus, the safest way is to use a parser. Here is a sample one:
var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
console.log("Incorrect (with regex): ", s.match(/"([^"\\]*(?:\\.[^"\\]*)*)"/g));
var res = [];
var tmp = "";
var in_quotes = false;
var in_entity = false;
for (var i=0; i<s.length; i++) {
if (s[i] === '\\' && in_entity === false) {
in_entity = true;
if (in_quotes === true) {
tmp += s[i];
}
} else if (in_entity === true) { // add a match
in_entity = false;
if (in_quotes === true) {
tmp += s[i];
}
} else if (s[i] === '"' && in_quotes === false) { // start a new match
in_quotes = true;
tmp += s[i];
} else if (s[i] === '"' && in_quotes === true) { // append char to match and add to results
tmp += s[i];
res.push(tmp);
tmp = "";
in_quotes = false;
} else if (in_quotes === true) { // append a char to the match
tmp += s[i];
}
}
console.log("Correct results: ", res);
Not-so-safe regex approach
It is not possible to match the string you need with lazy dot matching pattern since it will stop before the first "
. If you know your string will never have an escaped quote before a quoted substring, and if you are sure there are no literal \
before double quotes (and these conditions are very strict to use the regex safely), you can use
/"([^"\\]*(?:\\.[^"\\]*)*)"/g
See the regex demo
"
- match a quote([^"\\]*(?:\\.[^"\\]*)*)
- 0 or more sequences of[^"\\]*
- 0+ non-\
and non"
s(?:\\.[^"\\]*)*
- zero or more sequences of\\.
- any escaped symbol[^"\\]*
- 0+ non-\
and non"
s
"
- trailing quote
JS demo:
var re = /"([^"\\]*(?:\\.[^"\\]*)*)"/g;
var str = `some text "some \\"string\\"right here " some text "another \\"string\\"right here "`;
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res, 0, 4) + "</pre>"; // Just for demo
console.log(res); // or another result demo
Safe regex approach
Complementing @WiktorStribiżew's answer, there is a technique to start matching at the correct double quote using regex. It consists of matching both quoted and unquoted text in the form:
/"(quoted)"|unquoted/g
As you can see, the quoted text is matched by a group, so we'll only consider text backreferenced by match[1]
.
Regex
/"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g
Code
var regex = /"([^"\\]*(?:\\.[^"\\]*)*)"|[^"\\]*(?:\\.[^"\\]*)*/g;
var s = "some text \\\"extras\" some \\\"string \\\" right\" here \"";
var match;
var res = [];
while ((match = regex.exec(s)) !== null) {
if (match.index === regex.lastIndex)
regex.lastIndex++;
if( match[1] != null )
res.push(match[1]); //Append to result only group 1
}
console.log("Correct results (regex technique): ",res)
Universal solution:
- quote types: single, double, backticks
- detect each quoted part and quote type
- allows escaped quotes to be inside quoted parts
- results in two groups: <qType> (quote type), <inQuotes>
(?<qType>["'`])(?<inQuotes>(?:\\\1|.)*?)\1
or, without group naming:
(["'`])((?:\\\1|.)*?)\1
You can use this regex :
/[^\\](\".*?[^\\]\")/g
[^\\]
catch any caracter diferent of \. So \" will not be catch as start or end of your match.
In order to match from quote to quote while ignoring any simple escaped quotes (\"
):
(:?[^\\]|^)(\"(:?.*?[^\\]){0,1}\")
Meaning (:?
start of grouping with no extraction [^\\]
match one char that is not a backslash |
match the previous char or ^
which is beginning of string. (
start of extraction grouping \"
find quotes (that follow non slash or start of string), (:?.*?[^\\]
match shortest substring ending with none slash, ){0,1}
zero times or one - that actually means one time or an empty substring, that is followed by \"
a quote mark.
Edit:
Wiktor Stribiżew Correctly pointed out that some more cases with regex terms in the string will fail in my initial answer. for example \\"
that should be matched similar to "
in your case. To avoid this specific issue you can use
(:?[^\\]|^)((:?\\\\)*\"(:?.*?[^\\]){0,1}(:?\\\\)*\")
But for actual regex patibility you will need to refer to Wiktor's answer.