I'm not good at regex, trying to make 2 regex.
Regex1:
All specified words in any order but nothing else. (repetition allowed).
Regex2:
All specified words in any order but nothing else. (repetition not allowed).
Words:
aaa, bbb, ccc
Strings:
aaa ccc bbb
aaa ccc
aaa bbb ddd ccc
bbb aaa bbb ccc
Regex1 evaluate above strings as:
true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
false -> repetition not allowed
Regex2 evaluate above strings as:
true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
true -> all word present in any order and repetition is allowed
My Attempt
/^(?=.*\baaa\b)(?=.*\bbbb\b)(?=.*\bccc\b).*$/
Asking for learning purpose so please elaborate it.
I'm not good at regex, trying to make 2 regex.
Regex1:
All specified words in any order but nothing else. (repetition allowed).
Regex2:
All specified words in any order but nothing else. (repetition not allowed).
Words:
aaa, bbb, ccc
Strings:
aaa ccc bbb
aaa ccc
aaa bbb ddd ccc
bbb aaa bbb ccc
Regex1 evaluate above strings as:
true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
false -> repetition not allowed
Regex2 evaluate above strings as:
true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
true -> all word present in any order and repetition is allowed
My Attempt
/^(?=.*\baaa\b)(?=.*\bbbb\b)(?=.*\bccc\b).*$/
Asking for learning purpose so please elaborate it.
Share Improve this question edited Mar 13, 2019 at 5:57 shajji asked Mar 12, 2019 at 7:52 shajjishajji 1,66711 silver badges16 bronze badges 8- So some chars like spaces are allowed to exist between words? What else could be there? – revo Commented Mar 12, 2019 at 8:07
- only spaces, newline, tabs are allowed. – shajji Commented Mar 12, 2019 at 8:17
- 1 Are you sure about newlines to exist between words? – revo Commented Mar 12, 2019 at 8:21
- 1 Please check this regex101./r/Olu2kI/1 – revo Commented Mar 12, 2019 at 8:40
-
1
Just because you can use a regex doesn't mean you should.
var input = "ccc aaa ccc bbb"; var words = input.split(" "); var uniqueWords = Array.from(new Set(words)); console.log(uniqueWords.sort().join(" ") === "aaa bbb ccc");
– Eric Duminil Commented Mar 12, 2019 at 10:36
4 Answers
Reset to default 6Without repetition regex101
^(?:(aaa|bbb|ccc)(?!.*?\b\1) ?\b){3}$
And with repetition regex101
^(?=.*?\baaa)(?=.*?\bbbb)(?=.*?\bccc)(?:(aaa|bbb|ccc) ?\b)+$
Two more ideas. Regex explanation at regex101 on the right side.
For Regex 1:
var re = /^(?=.*?\baaa\b)(?=.*?\bbbb\b)(?=.*?\bccc\b)\b(?:aaa|bbb|ccc)\b(?: +\b(?:aaa|bbb|ccc)\b)*$/;
var res = document.getElementById('result');
res.innerText += re.test('aaa ccc bbb');
res.innerText += ', ' + re.test('aaa ccc ddd');
res.innerText += ', ' + re.test('aaa ddd bbb');
res.innerText += ', ' + re.test('ccc bbb ccc');
<div id="result"></div>
Your code already does part of the trick. Your positive lookaheads check that all words appear somewhere, however not, that they are the only words present. To achieve this, I added the circumflex (^) at the beginning to detect the start of the string. Then, the non capturing group of \b(?:aaa|bbb|ccc)\b
, to detect the first instance of any word.
This is then followed by any number of words, preceded by at least one space (?:\s+\b(?:aaa|bbb|ccc)\b)*
, basically the same pattern, but with the \s+ in front, and wrapped in a *. And then we need the string to end somewhere. This is done with the dollar sign $
.
For Regex 2:
The basic strategy is the same. You would just check with a negative lookahead, that the matched string does not exist again:
//var re = /^(?=.*?\baaa\b)(?!.*?\baaa\b.*?\baaa\b)(?=.*?\bbbb\b)(?!.*?\bbbb\b.*?\bbbb\b)(?=.*?\bccc\b)(?!.*?\bccc\b.*?\bccc\b)\b(?:aaa|bbb|ccc)\b(?:\s+\b(?:aaa|bbb|ccc)\b)*$/;
// optimized version, see ments
var re = /^(?=.*?\baaa\b)(?=.*?\bbbb\b)(?=.*?\bccc\b)(?!.*?\b(\w+)\b.*?\b\1\b)\b(?:aaa|bbb|ccc)\b(?: +\b(?:aaa|bbb|ccc)\b)*$/;
var res = document.getElementById('result');
res.innerText += re.test('aaa ccc bbb');
res.innerText += ', ' + re.test('aaa ccc ddd');
res.innerText += ', ' + re.test('aaa bbb aaa');
res.innerText += ', ' + re.test('aaa ccc bbb ccc');
<div id="result"></div>
First, we have the positive lookahead (?=.*?\bword\b)
to see that word exists. We follow that by the negative lookahead (?!.*?\baaa\b.*?\baaa\b)
to see, the word does not exist multiple times. Repeat for all words. Presto!
Update: Instead of checking the specific words aren't repeated, we can also check that NO word is repeated by using the (?!.*?\b(\w+)\b.*?\b\1\b)
construct. This makes the regex more concise. Thanks to @revo for pointing it out.
why do you need regex to perform this function though? you could achieve what you want easily by first splitting the strings with delimiter ",". You can then create a dictionary object with the words that you are seeking as the keys and values defaulted to -1
Regex 2 can be achieved by looping through the input words and check if they exists as keys in the dictionary object. Regex 1 can be achieved similarly, just that when a key is matched to the input word, its value would then be changed to 1 and when it is next visited, a false match can be returned.
Do not use regex for uniqueness.
But for separate words in regex, you can use \b
Example: /\b(word1|word2|word3)\b/