最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Split string with regex skipping brackets [] - Stack Overflow

programmeradmin4浏览0评论

I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.

For example,

input: 'tree car[tesla BMW] cat color[yellow blue] dog'

output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']

if I use simple .split(' ') it would go inside brackets and return an incorrect result.

Also, I've tried to write a regex, but unsuccessfully :(

My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/) and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]

Would be really grateful for any help

I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.

For example,

input: 'tree car[tesla BMW] cat color[yellow blue] dog'

output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']

if I use simple .split(' ') it would go inside brackets and return an incorrect result.

Also, I've tried to write a regex, but unsuccessfully :(

My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/) and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]

Would be really grateful for any help

Share Improve this question asked May 21, 2021 at 12:50 MarkMarkMarkMark 1841 gold badge1 silver badge13 bronze badges 1
  • /\w+(?:[.+?])?/g – bel3atar Commented May 21, 2021 at 13:07
Add a ment  | 

4 Answers 4

Reset to default 5

This is easier with match:

input = 'tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog'

output = input.match(/[^[\]\s]+(\[.+?\])?/g)

console.log(output)

With split you need a lookahead like this:

input = 'tree car[tesla BMW] cat color[yellow blue] dog'

output = input.split(/ (?![^[]*\])/)

console.log(output)

Both snippets only work if brackets are not nested, otherwise you'd need a parser rather than a regexp.

You could split on a space asserting to the right 1 or more non whitespace chars except for square brackets and optionally match from an opening till closing square bracket followed by a whitespace boundary at the right.

[ ](?=[^\][\s]+(?:\[[^\][]*])?(?!\S))

Explanation

  • [ ] Match a space (square brackets only for clarity)
  • (?= Postive lookahead
    • [^\][\s]+ Match 1+ times any char except ] [ or a whitespace char
    • (?:\[[^\][]*])? Optinally match from [...]
    • (?!\S) A whitespace boundary to the right
  • ) Close lookahead

Regex demo

const regex = / (?=[^\][\s]+(?:\[[^\][]*])?(?!\S))/g;
[
  "tree car[tesla BMW] cat color[yellow blue] dog",
  "tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog",
  "tree:test car[tesla BMW]",
  "tree car[tesla BMW] cat color yellow blue] dog",
  "tree car[tesla BMW] cat color[yellow blue dog"
].forEach(s => console.log(s.split(regex)));

Here is one regex find all option:

var input = 'tree car[tesla BMW] cat color[yellow blue] dog';
var matches = input.match(/\[.*?\]|[ ]|\b\w+\b/g);
var output = [];
var idx1 = 0;
var idx2 = 0;

do {
    if (matches[idx1] === " ") {
        ++idx1;
        continue;
    }

    do {
        output[idx2] = output[idx2] ? output[idx2] + matches[idx1] : matches[idx1];
        ++idx1;
    } while(matches[idx1] != " " && idx1 < matches.length);
    ++idx2;
} while(idx1 < matches.length);

console.log(output);

For an explanation of the regex, we deal with the [...] terms which might have spaces by eagerly trying to match them first. Next, we look for space separators, and finally we look for standalone words. Here is the regex:

\[.*?\]   find a [...] term
|         OR
[ ]       find a space
|         OR
\b\w+\b   find a word

This gives us the following intermediate array:

["tree", " ", "car", "[tesla BMW]", " ", "cat", " ", "color", "[yellow blue]", " ", "dog"]

Then we iterate and join together all non space entries in an output array, using the actual spaces to indicate where the real separations should be happening.

If you insist to use regex I remend you to watch this page. The writer split by ma but I believe you smart enough to change it to space

发布评论

评论列表(0)

  1. 暂无评论