最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Regex optional non-capturing groups - Stack Overflow

programmeradmin3浏览0评论

i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation.

I want to match the following strings:

  1. Neuer Film a von 1000

  2. Neuer Film a von 1000 mit b

  3. Neuer Film a von 1000 mit b und c

  4. Neuer Film a von 1000 mit b und c und d

  5. Neuer Film a mit b

  6. Neuer Film a mit b und c

  7. Neuer Film a mit b und c und d

My regex looks like this:

var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g;

The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4.

Can someone please help with my Regex (which is not very user friendly at all ;)

i am a total Regex Noob and spent hours trying to solve this puzzle. I think I have to use some kind of optional non-capturing groups or alternation.

I want to match the following strings:

  1. Neuer Film a von 1000

  2. Neuer Film a von 1000 mit b

  3. Neuer Film a von 1000 mit b und c

  4. Neuer Film a von 1000 mit b und c und d

  5. Neuer Film a mit b

  6. Neuer Film a mit b und c

  7. Neuer Film a mit b und c und d

My regex looks like this:

var regex = /(?:[nN]euer [Ff]ilm\s?)(.*)(?:[vV]on).(\d{4}).(?:[Mm]it)(.*)(?:[uU]nd)(.*)/g;

The problem is it matches only string 3 and 4. And it does not match the last two "und", but packs it in group No.3 not in group No.4.

Can someone please help with my Regex (which is not very user friendly at all ;)

Share Improve this question asked Apr 11, 2017 at 19:48 TrantSteelTrantSteel 2451 gold badge2 silver badges10 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 20

You really need to use non-capturing optional groups (like (?:...)?), but besides, you also need anchors (^ to match the start of the string and $ to match the string end) and lazy dot matching patterns (.*?, to match as few any chars as possible).

You may use

/^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/

See the regex demo. In the demo, /gm modifiers are necessary since the input is a multiline string.

Pattern details:

  • ^ - start of a string anchor
  • [nN]euer [Ff]ilm - Neuer film / Neuer Film / neuer Film
  • \s* - zero or more whitespaces
  • (.*?) - Group 1: any 0+ chars other than line break chars, as few as possible (that is, up to the leftmost occurrence of the subsequent subpatterns)
  • (?:\s*[vV]on\s+(\d{4}))? - 1 or 0 occurrences of:
    • \s* - 0+ whitespaces
    • [vV]on - von or Von
    • \s+ - 1+ whitespaces
    • (\d{4}) - Group 2: 4 digits
  • (?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)? - an optional non-capturing group matching 1 or 0 occurrences of:
    • \s+ - 1+ whitespaces
    • [Mm]it - Mit or mit
    • \s* - 0+ whitespaces
    • (.*?) - Group 3 matching any 0+ chars other than line break chars, as few as possible
    • (?:\s*[uU]nd\s*(.*))? - an optional non-capturing group matching
      • \s*[uU]nd\s* - und or Und enclosed with 0+ whitespaces
      • (.*) - Group 4 matching any 0+ chars other than line break chars, as many as possible
  • $ - end of string.

var strs = ['Neuer Film a von 1000','Neuer Film a von 1000 mit b','Neuer Film a von 1000 mit b und c','Neuer Film a von 1000 mit b und c und d','Neuer Film a mit b','Neuer Film a mit b und c','Neuer Film a mit b und c und d'];
var rx = /^[nN]euer [Ff]ilm\s*(.*?)(?:\s*[vV]on\s+(\d{4}))?(?:\s+[Mm]it\s*(.*?)(?:\s*[uU]nd\s*(.*))?)?$/;
for (var s of strs) {
   var m = rx.exec(s);
   if (m) {
     console.log('-- ' + s + ' ---');
     console.log('Group 1: ' + m[1]);
     if (m[2]) console.log('Group 2: ' + m[2]);
     if (m[3]) console.log('Group 3: ' + m[3]);
     if (m[4]) console.log('Group 4: ' + m[4]);
   }
   
}

发布评论

评论列表(0)

  1. 暂无评论