I wish to search for a searchWord
& replace it only if it is OUTSIDE quotes.
Currently breaking the string into lines and using a pattern like this
(["'])(?:\\\\.|[^\\\\])*?\\b${searchValue}\\b(?:\\\\.|[^\\\\])*?\\1
Then using a NOT logic to replace only outside searchWords
The issue is in case of a string like
This "searchword" searchword "searchword"
It matches the second search word too since its between the ending quote of first string & beginning of second string
I wish to search for a searchWord
& replace it only if it is OUTSIDE quotes.
Currently breaking the string into lines and using a pattern like this
(["'])(?:\\\\.|[^\\\\])*?\\b${searchValue}\\b(?:\\\\.|[^\\\\])*?\\1
Then using a NOT logic to replace only outside searchWords
The issue is in case of a string like
This "searchword" searchword "searchword"
It matches the second search word too since its between the ending quote of first string & beginning of second string
Share Improve this question edited Apr 1 at 4:01 Phalaksha C G asked Mar 31 at 14:20 Phalaksha C GPhalaksha C G 415 bronze badges New contributor Phalaksha C G is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 11 | Show 6 more comments2 Answers
Reset to default 1So it sounds like you're using JS and edited the question to match these search values outside of either single or double quotes. The problem is that trying to match something inside/outside quotes the balance of the quotes can easily get uneven because of backtracking.
To get around this, try to first match all quoted and unquoted parts but capture the unquoted parts into a capture group for being able to distinguish what's inside and outside:
"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|([^"']+)
See this demo at regex101
The pattern contains three options in the alternation: 1. double quoted, 2. single quoted (both considering escaped quotes) and 3. capture non-quotes into the first group.
Now in JS use replace() and a callback function on the matches of group 1:
function highlightValue(text, searchValue)
{
// match all quoted strings but capture non-quoted to the first group
const regex = /"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|([^"']+)/g;
// either return quoted parts (m0) or replace in the non-quoted (m1)
return text.replace(regex, (m0, m1) =>
{
// m1: outside quotes (first capture group) -> replace
if(m1) {
const innerRegex = new RegExp(`\\b${searchValue}\\b`, 'gi');
return m1.replace(innerRegex, `<mark>${searchValue}</mark>`);
}
// m0: inside quotes (no group) -> do not modify
return m0;
});
}
// test
let text = `searchword outside "inside searchword" searchword 'searchword'`;
let search = `searchword`;
console.log(highlightValue(text, search));
Here is the JS demo at tio.run
And for completion the opposite version to match searchValue inside quotes:
function highlightValue(text, searchValue)
{
// match single and double quoted strings and replace in matches
const regex = /"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'/g;
return text.replace(regex, (m) =>
{
const innerRegex = new RegExp(`\\b${searchValue}\\b`, 'gi');
return m.replace(innerRegex, `<mark>${searchValue}</mark>`);
});
}
// test
let text = `searchword outside "inside searchword" searchword 'searchword'`;
let search = `searchword`;
console.log(highlightValue(text, search));
This is shorter because it does not require a group, JS demo here.
This pattern will match and capture $searchValue
inside the quotes when $searchValue
is surrounded by word boundary \b
on both sides, as well as everything before and after the, \b$searchValue\b
. This will be stored in Group 1
or Group 2
. NOTE: If the acceptable searchValue pattern is not found inside the quotes, the pattern will match but not capture. This way we will not lose track of the quotes and start capturing strings willy-nilly . (For reference, check out cool article on this pattern https://www.rexegg/regex-best-trick.php.)
REGEX PATTERN (PRCE2 Flavor):
"([^"]*\b${searchValue}\b[^"]*)"|'([^']*\b${searchValue}\b[^']*)'|"[^"]*"|'[^']*'
Regex demo: https://regex101/r/FX3WX4/4
DEMO REGULAR EXPRESSION ($searchValue = 'searchword'
):
"([^"]*\bsearchword\b[^"]*)"|'([^']*\bsearchword\b[^']*)'|"[^"]*"|'[^']*'
TEST STRING:
This "searchword1". searchword. "searchword"
This 'searchword2'. searchword. "searchword"
This 'searchword 2'. searchword. "Hello, searchword!"
MATCH / GROUP
MATCH 1 5-18 "searchword1"
MATCH 2 33-45 "searchword"
GROUP 1 34-44 searchword
MATCH 3 51-64 'searchword2'
MATCH 4 79-91 "searchword"
GROUP 1 80-90 searchword
MATCH 5 97-111 'searchword 2'
GROUP 2 98-110 searchword 2
MARCH 6 126-146 "Hello, searchword!"
GROUP 1 127-145 Hello, searchword!
REGEX NOTES:
- First Alternative:
"([^"]*\b${searchValue}\b[^"]*)"
"
Match literal double quote"
.(
Begin capture into Group 1:[^"]*
Negated character class[^...]
. Match any character except literal double quote,"
, 0 or more (*
) times.\b
Match word boundary.${searchValue}
Variable. Match string here.\b
Match word boundary.[^"]*
Negated character class[^...]
. Match any character except literal double quote,"
, 0 or more (*
) times.[^"]*
)
End capture Group 1."
Match literal double quote,"
.
|
OR- Second Alternative:
'([^']*\b${searchValue}\b[^']*)'
'
Match literal single quote'
.(
Begin capture into Group 2:[^']*
Negated character class[^...]
. Match any character except literal single quote,'
, 0 or more (*
) times.\b
Match word boundary.${searchValue}
Variable. Match string here.\b
Match word boundary.[^']*
Negated character class[^...]
. Match any characters except literal single quote,'
, 0 or more (*
) times.
)
End capture Group 2.'
Match literal single quote,'
.
|
OR- Third Alternative:
"[^"]*"
"
Match literal double quote"
.[^"]*
Negated character class[^...]
. Match any character except literal double quote,"
, 0 or more (*
) times.[^"]*
|
OR- Fourth Alternative:
'[^']*
'
Match literal single quote'
.[^']*
Negated character class[^...]
. Match any character except literal single quote,'
, 0 or more (*
) times.
'
Match literal single quote,'
.
"
or'
(what was captured into Group 1 from both the[^\\\\]
parts. I'd add(?!\\1)
right in front of them. ->(["'])(?:\\\\.|(?!\\1)[^\\\\])*?\\b${searchValue}\\b(?:\\\\.|(?!\\1)[^\\\\])*?\\1
– Wiktor Stribiżew Commented Mar 31 at 14:28