最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Regex to match a search word outside quotes unlike a string (edge case) - Stack Overflow

programmeradmin5浏览0评论

I wish to search for a searchWord & replace it only if it is OUTSIDE quotes.

Currently breaking the string into lines and using a pattern like this
(["'])(?:\\\\.|[^\\\\])*?\\b${searchValue}\\b(?:\\\\.|[^\\\\])*?\\1

Then using a NOT logic to replace only outside searchWords

The issue is in case of a string like

This "searchword" searchword "searchword"

It matches the second search word too since its between the ending quote of first string & beginning of second string

I wish to search for a searchWord & replace it only if it is OUTSIDE quotes.

Currently breaking the string into lines and using a pattern like this
(["'])(?:\\\\.|[^\\\\])*?\\b${searchValue}\\b(?:\\\\.|[^\\\\])*?\\1

Then using a NOT logic to replace only outside searchWords

The issue is in case of a string like

This "searchword" searchword "searchword"

It matches the second search word too since its between the ending quote of first string & beginning of second string

Share Improve this question edited Apr 1 at 4:01 Phalaksha C G asked Mar 31 at 14:20 Phalaksha C GPhalaksha C G 415 bronze badges New contributor Phalaksha C G is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 11
  • Well, I can't repro, but... it seems to me you should exclude the " or ' (what was captured into Group 1 from both the [^\\\\] parts. I'd add (?!\\1) right in front of them. -> (["'])(?:\\\\.|(?!\\1)[^\\\\])*?\\b${searchValue}\\b(?:\\\\.|(?!\\1)[^\\\\])*?\\1 – Wiktor Stribiżew Commented Mar 31 at 14:28
  • That would prevent matches having first quote and last quote isn't it? But the case I mentioned will continue to occur – Phalaksha C G Commented Mar 31 at 14:30
  • Please share some more real-life test case, I can't see the issue with the current one. – Wiktor Stribiżew Commented Mar 31 at 14:31
  • 2 Regular expressions aren't good at telling inside from outside, because they can't easily distinguish the odd and even quotes. – Barmar Commented Mar 31 at 14:40
  • 3 Please provide enough code so others can better understand or reproduce the problem. – Community Bot Commented Mar 31 at 18:58
 |  Show 6 more comments

2 Answers 2

Reset to default 1

So it sounds like you're using JS and edited the question to match these search values outside of either single or double quotes. The problem is that trying to match something inside/outside quotes the balance of the quotes can easily get uneven because of backtracking.

To get around this, try to first match all quoted and unquoted parts but capture the unquoted parts into a capture group for being able to distinguish what's inside and outside:

"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|([^"']+)

See this demo at regex101

The pattern contains three options in the alternation: 1. double quoted, 2. single quoted (both considering escaped quotes) and 3. capture non-quotes into the first group.

Now in JS use replace() and a callback function on the matches of group 1:

function highlightValue(text, searchValue)
{
  // match all quoted strings but capture non-quoted to the first group
  const regex = /"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|([^"']+)/g;
  
  // either return quoted parts (m0) or replace in the non-quoted (m1)
  return text.replace(regex, (m0, m1) =>
  {
    // m1: outside quotes (first capture group) -> replace
    if(m1) {
      const innerRegex = new RegExp(`\\b${searchValue}\\b`, 'gi');
      return m1.replace(innerRegex, `<mark>${searchValue}</mark>`);
    }
    
    // m0: inside quotes (no group) -> do not modify
    return m0;
  });
}

// test
let text = `searchword outside "inside searchword" searchword 'searchword'`;
let search = `searchword`;

console.log(highlightValue(text, search));

Here is the JS demo at tio.run


And for completion the opposite version to match searchValue inside quotes:

function highlightValue(text, searchValue)
{
  // match single and double quoted strings and replace in matches
  const regex = /"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'/g;
  return text.replace(regex, (m) =>
  {
    const innerRegex = new RegExp(`\\b${searchValue}\\b`, 'gi');
    return m.replace(innerRegex, `<mark>${searchValue}</mark>`);
  });
}

// test
let text = `searchword outside "inside searchword" searchword 'searchword'`;
let search = `searchword`;

console.log(highlightValue(text, search));

This is shorter because it does not require a group, JS demo here.

This pattern will match and capture $searchValue inside the quotes when $searchValue is surrounded by word boundary \bon both sides, as well as everything before and after the, \b$searchValue\b. This will be stored in Group 1 or Group 2. NOTE: If the acceptable searchValue pattern is not found inside the quotes, the pattern will match but not capture. This way we will not lose track of the quotes and start capturing strings willy-nilly . (For reference, check out cool article on this pattern https://www.rexegg/regex-best-trick.php.)

REGEX PATTERN (PRCE2 Flavor):

"([^"]*\b${searchValue}\b[^"]*)"|'([^']*\b${searchValue}\b[^']*)'|"[^"]*"|'[^']*'

Regex demo: https://regex101/r/FX3WX4/4

DEMO REGULAR EXPRESSION ($searchValue = 'searchword'):

"([^"]*\bsearchword\b[^"]*)"|'([^']*\bsearchword\b[^']*)'|"[^"]*"|'[^']*'

TEST STRING:

This "searchword1". searchword.  "searchword"
This 'searchword2'. searchword.  "searchword"
This 'searchword 2'. searchword.  "Hello, searchword!"

MATCH / GROUP

MATCH 1  5-18   "searchword1"

MATCH 2  33-45  "searchword"
GROUP 1 34-44   searchword

MATCH 3 51-64   'searchword2'

MATCH 4 79-91   "searchword"
GROUP 1 80-90   searchword

MATCH 5 97-111  'searchword 2'
GROUP 2 98-110  searchword 2

MARCH 6 126-146 "Hello, searchword!"
GROUP 1 127-145 Hello, searchword!

REGEX NOTES:

  • First Alternative: "([^"]*\b${searchValue}\b[^"]*)"
    • " Match literal double quote ".
    • ( Begin capture into Group 1:
      • [^"]* Negated character class [^...]. Match any character except literal double quote, ", 0 or more (*) times.
      • \b Match word boundary.
      • ${searchValue} Variable. Match string here.
      • \b Match word boundary.
      • [^"]* Negated character class [^...]. Match any character except literal double quote, ", 0 or more (*) times.
      • [^"]*
    • ) End capture Group 1.
    • " Match literal double quote, ".
  • | OR
  • Second Alternative: '([^']*\b${searchValue}\b[^']*)'
    • ' Match literal single quote '.
    • ( Begin capture into Group 2:
      • [^']* Negated character class [^...]. Match any character except literal single quote, ', 0 or more (*) times.
      • \b Match word boundary.
      • ${searchValue} Variable. Match string here.
      • \b Match word boundary.
      • [^']* Negated character class [^...]. Match any characters except literal single quote, ', 0 or more (*) times.
    • ) End capture Group 2.
    • ' Match literal single quote, '.
  • | OR
  • Third Alternative: "[^"]*"
    • " Match literal double quote ".
      • [^"]* Negated character class [^...]. Match any character except literal double quote, ", 0 or more (*) times.
      • [^"]*
  • | OR
  • Fourth Alternative: '[^']*
    • ' Match literal single quote '.
      • [^']* Negated character class [^...]. Match any character except literal single quote, ', 0 or more (*) times.
    • ' Match literal single quote, '.
发布评论

评论列表(0)

  1. 暂无评论