I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.
For example,
input: 'tree car[tesla BMW] cat color[yellow blue] dog'
output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']
if I use simple .split(' ')
it would go inside brackets and return an incorrect result.
Also, I've tried to write a regex, but unsuccessfully :(
My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/)
and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]
Would be really grateful for any help
I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.
For example,
input: 'tree car[tesla BMW] cat color[yellow blue] dog'
output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']
if I use simple .split(' ')
it would go inside brackets and return an incorrect result.
Also, I've tried to write a regex, but unsuccessfully :(
My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/)
and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]
Would be really grateful for any help
Share Improve this question asked May 21, 2021 at 12:50 MarkMarkMarkMark 1841 gold badge1 silver badge13 bronze badges 1- /\w+(?:[.+?])?/g – bel3atar Commented May 21, 2021 at 13:07
4 Answers
Reset to default 5This is easier with match
:
input = 'tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog'
output = input.match(/[^[\]\s]+(\[.+?\])?/g)
console.log(output)
With split
you need a lookahead like this:
input = 'tree car[tesla BMW] cat color[yellow blue] dog'
output = input.split(/ (?![^[]*\])/)
console.log(output)
Both snippets only work if brackets are not nested, otherwise you'd need a parser rather than a regexp.
You could split on a space asserting to the right 1 or more non whitespace chars except for square brackets and optionally match from an opening till closing square bracket followed by a whitespace boundary at the right.
[ ](?=[^\][\s]+(?:\[[^\][]*])?(?!\S))
Explanation
[ ]
Match a space (square brackets only for clarity)(?=
Postive lookahead[^\][\s]+
Match 1+ times any char except]
[
or a whitespace char(?:\[[^\][]*])?
Optinally match from[...]
(?!\S)
A whitespace boundary to the right
)
Close lookahead
Regex demo
const regex = / (?=[^\][\s]+(?:\[[^\][]*])?(?!\S))/g;
[
"tree car[tesla BMW] cat color[yellow blue] dog",
"tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog",
"tree:test car[tesla BMW]",
"tree car[tesla BMW] cat color yellow blue] dog",
"tree car[tesla BMW] cat color[yellow blue dog"
].forEach(s => console.log(s.split(regex)));
Here is one regex find all option:
var input = 'tree car[tesla BMW] cat color[yellow blue] dog';
var matches = input.match(/\[.*?\]|[ ]|\b\w+\b/g);
var output = [];
var idx1 = 0;
var idx2 = 0;
do {
if (matches[idx1] === " ") {
++idx1;
continue;
}
do {
output[idx2] = output[idx2] ? output[idx2] + matches[idx1] : matches[idx1];
++idx1;
} while(matches[idx1] != " " && idx1 < matches.length);
++idx2;
} while(idx1 < matches.length);
console.log(output);
For an explanation of the regex, we deal with the [...]
terms which might have spaces by eagerly trying to match them first. Next, we look for space separators, and finally we look for standalone words. Here is the regex:
\[.*?\] find a [...] term
| OR
[ ] find a space
| OR
\b\w+\b find a word
This gives us the following intermediate array:
["tree", " ", "car", "[tesla BMW]", " ", "cat", " ", "color", "[yellow blue]", " ", "dog"]
Then we iterate and join together all non space entries in an output array, using the actual spaces to indicate where the real separations should be happening.
If you insist to use regex I remend you to watch this page.
The writer split by ma but I believe you smart enough to change it to space