i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation.
I want to match the following strings:
Neuer Film a von 1000
Neuer Film a von 1000 mit b
Neuer Film a von 1000 mit b und c
Neuer Film a von 1000 mit b und c und d
Neuer Film a mit b
Neuer Film a mit b und c
Neuer Film a mit b und c und d
My regex looks like this:
var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g;
The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4.
Can someone please help with my Regex (which is not very user friendly at all ;)
i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation.
I want to match the following strings:
Neuer Film a von 1000
Neuer Film a von 1000 mit b
Neuer Film a von 1000 mit b und c
Neuer Film a von 1000 mit b und c und d
Neuer Film a mit b
Neuer Film a mit b und c
Neuer Film a mit b und c und d
My regex looks like this:
var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g;
The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4.
Can someone please help with my Regex (which is not very user friendly at all ;)
Share Improve this question asked Apr 11, 2017 at 19:48 TrantSteelTrantSteel 2451 gold badge2 silver badges10 bronze badges1 Answer
Reset to default 20You really need to use non-capturing optional groups (like (?:...)?
), but besides, you also need anchors (^
to match the start of the string and $
to match the string end) and lazy dot matching patterns (.*?
, to match as few any chars as possible).
You may use
/^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/
See the regex demo. In the demo, /gm
modifiers are necessary since the input is a multiline string.
Pattern details:
^
- start of a string anchor[nN]euer [Ff]ilm
-Neuer film
/Neuer Film
/neuer Film
\s*
- zero or more whitespaces(.*?)
- Group 1: any 0+ chars other than line break chars, as few as possible (that is, up to the leftmost occurrence of the subsequent subpatterns)(?:\s*[vV]on\s+(\d{4}))?
- 1 or 0 occurrences of:\s*
- 0+ whitespaces[vV]on
-von
orVon
\s+
- 1+ whitespaces(\d{4})
- Group 2: 4 digits
(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?
- an optional non-capturing group matching 1 or 0 occurrences of:\s+
- 1+ whitespaces[Mm]it
-Mit
ormit
\s*
- 0+ whitespaces(.*?)
- Group 3 matching any 0+ chars other than line break chars, as few as possible(?:\s*[uU]nd\s*(.*))?
- an optional non-capturing group matching\s*[uU]nd\s*
-und
orUnd
enclosed with 0+ whitespaces(.*)
- Group 4 matching any 0+ chars other than line break chars, as many as possible
$
- end of string.
var strs = ['Neuer Film a von 1000','Neuer Film a von 1000 mit b','Neuer Film a von 1000 mit b und c','Neuer Film a von 1000 mit b und c und d','Neuer Film a mit b','Neuer Film a mit b und c','Neuer Film a mit b und c und d'];
var rx = /^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/;
for (var s of strs) {
var m = rx.exec(s);
if (m) {
console.log('-- ' + s + ' ---');
console.log('Group 1: ' + m[1]);
if (m[2]) console.log('Group 2: ' + m[2]);
if (m[3]) console.log('Group 3: ' + m[3]);
if (m[4]) console.log('Group 4: ' + m[4]);
}
}