最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c# - Match but break on +|- - Stack Overflow

programmeradmin5浏览0评论

I have the following regex "(.*)\s*(?:([+-])\s*([1-9]\d{0,2})([dmyw]))?$"

And I'm trying to parse things like

friday+
friday+1
friday+1d (works correctly)
2/23/25
2/23/25+
2/23/25+1d (works correctly)
March 23
March 23 + 1
March 23 + 1d

In ALL cases I need group 1 to always be either friday or 2/23/25 or March 23

When the string is complete I correctly get

Group 2 = +|-
Group 3 = number
Group 4 = D|M|Y|Y

For an incomplete string I need group 1 to contain everything up to the +|- and group 2 to be everything else.

I have the following regex "(.*)\s*(?:([+-])\s*([1-9]\d{0,2})([dmyw]))?$"

And I'm trying to parse things like

friday+
friday+1
friday+1d (works correctly)
2/23/25
2/23/25+
2/23/25+1d (works correctly)
March 23
March 23 + 1
March 23 + 1d

In ALL cases I need group 1 to always be either friday or 2/23/25 or March 23

When the string is complete I correctly get

Group 2 = +|-
Group 3 = number
Group 4 = D|M|Y|Y

For an incomplete string I need group 1 to contain everything up to the +|- and group 2 to be everything else.

Share Improve this question asked Apr 1 at 18:52 Michael TMichael T 7938 silver badges23 bronze badges 6
  • 1 Please note that, for MANY parsing problems, a regular expression is not the right answer. And you have to adapt yourself to the groups that are returned. You don't get to dictate that. – Tim Roberts Commented Apr 1 at 18:56
  • I think I have done it with this /([^+|-]+)\s*(?:([+-])\s*([1-9]\d{0,2})([dmyw]))?|(.*)$/ – Michael T Commented Apr 1 at 19:30
  • But this [^+|-]+ matches any character except + | - and now everything is optional so you also match empty strings and you have 5 groups where some groups can also have empty matches. – The fourth bird Commented Apr 1 at 19:35
  • 1 @MichaelT Also try e.g. ^([^+-]+)\s*(?:([+-])\s*(?:([1-9]\d{0,2})([dmwy])?)?)?$ (if you're parsing single lines, the demo is adjusted to multiline input) – bobble bubble Commented Apr 1 at 20:51
  • @bobblebubble - This looks to be perfect - thanks. – Michael T Commented Apr 1 at 23:43
 |  Show 1 more comment

6 Answers 6

Reset to default 4

In your current regex you are using .* which is greedy and after it you put the optional group. The problem is that the first group consumes everything and your optional group won't match.

As you found out yourself, you can use a negated character class [^+-] to stop before the first occurance of one of the listed characters. Btw. a character class is a defined set of characters. If you don't want to explicitly exclude the vertical bar, don't put it inside.

@The fourth bird already analyzed in his answer how your optional tokens depend on each other from left to right: Match and capture [+-] optionally, match and capture [1-9]\d{0,2} only if [+-] matched before, match [dmwy] only if the previous option matched before. So you could nest these in in groups like (a(b(c)?)?)? inside an optional non-capturing group.

^([^+-]+)\s*(?:([+-])\s*(?:([1-9]\d{0,2})([dmwy])?)?)?$

See this demo at regex101 (the \n and [ \t] in the demo are just for multiline showcase)


Update: Oh, I had previously overlooked:

... I need group 1 to contain everything up to the +|- and group 2 to be everything else.

If you want only two groups, group 1 up to [+-] and everything else in group 2:

^([^+-]+)\s*([+-](?:\s*[1-9]\d{0,2}[dmwy]?)?)?$

Here an updated demo at regex101 using the same functionality but less capture groups.

If there are optional spaces allowed between all parts, you might use an alternation in group 1, and make group 2, 3 and 4 optional in a nested way:

^([a-z]+|\d{1,2}/\d{1,2}/\d{1,2}|[A-Z][a-z]+[\p{Zs}\t]*\d+)(?:[\p{Zs}\t]*([+-])(?:[\p{Zs}\t]*(\d+)(?:[\p{Zs}\t]*([dmyw]))?)?)?

The regex matches:

  • ^ Start of string (if necessary), else use \b to prevent a partial match
  • ( Capture group 1 (this should always be present)
    • [a-z]+ Match 1+ chars a-z
    • | Or
    • \d{1,2}/\d{1,2}/\d{1,2} Match a date like format (this does not validate a date)
    • | Or
    • [A-Z][a-z]+[\p{Zs}\t]*\d+ Match a single char A-Z, 1+ chars a-z, optional spaced and tabs and 1+ digits
  • ) Close group 1
  • (?: Non capture group
    • [\p{Zs}\t]* Match optional spaces or tabs
    • ([+-]) Capture group 2, match either + or -
    • (?: Non capture group
      • [\p{Zs}\t]* Match optional spaces or tabs
      • (\d+) Capture group 3, match 1+ digits
      • (?: Non capture group
        • [\p{Zs}\t]* Match optional spaces or tabs
        • ([dmyw]) Capture group 4, match one of the listed characters
      • )? Close the non capture group and make it optional for 4
    • )? Close the non capture group and make it optional for 3
  • )? Close the non capture group and make it optional for 2

Instead of using a character class [a-z] you could also extend the pattern to match the week days and the months to make it more specific.

See a regex101 demo.

You're going to have to check if group 3 or 4 matched to know if group 2 contains
the complete match. Might as well just add another group to act as a flag.
That would be group 5.

If group 5 Not matched, complete : groups 1,2,3,4 are valid.
If group 5 Matched, then incomplete : groups 1, 5 are valid.

^(.*?)\s*(?:([+-])(?:\s*([1-9]\d{0,2})([dmyw]))|([+-].*))$

https://regex101/r/7kerqj/1

^ 
( .*? )                    # (1)
\s* 
(?:
   ( [+-] )                # (2)
   (?:
      \s* 
      ( [1-9] \d{0,2} )    # (3)
      ( [dmyw] )           # (4)
   )
 | 
   ( [+-] .* )             # (5)
)
$

Here is a regular expression that matches your sequence. The 1st, the 6th, and the 7th groups have the information you need. The surrounding code is in Python, but the regular expression here should work.

import re

tests = [
"friday",
"friday+1",
"friday+1d",
"2/23/25",
"2/23/25+",
"2/23/25+1d",
"March 23",
"March 23 + 1",
"March 23 + 1d"
]

# So, we have ((string of letters optionally followed by space digits) OR (dd/dd/dd)) optionally 
# followed by space + space digits [dmwy].

match = repile(r"(([A-Za-z][a-z]+(\s\d{1,2})?)|(\d+/\d+/\d+))((\s*\+\s*\d+)([dmwy]?))?")

for t in tests:
    g = match.match(t)
    print(t,"\t",g.groups())

Output:

friday   ('friday', 'friday', None, None, None, None, None)
friday+1     ('friday', 'friday', None, None, '+1', '+1', '')
friday+1d    ('friday', 'friday', None, None, '+1d', '+1', 'd')
2/23/25      ('2/23/25', None, None, '2/23/25', None, None, None)
2/23/25+     ('2/23/25', None, None, '2/23/25', None, None, None)
2/23/25+1d   ('2/23/25', None, None, '2/23/25', '+1d', '+1', 'd')
March 23     ('March 23', 'March 23', ' 23', None, None, None, None)
March 23 + 1     ('March 23', 'March 23', ' 23', None, ' + 1', ' + 1', '')
March 23 + 1d    ('March 23', 'March 23', ' 23', None, ' + 1d', ' + 1', 'd')

I was not planning on answering this question. However, since I worked on a solution that works this for question/problem, I thought I share what I have. I understand that I have added names for weekdays and months, and added capture for year that was not part of the question, however, these patterns can easily be replaced as needed. Best way to see how the pattern works is in the regex demo (link below).

Every string (or line in m) and part of a string is captured in one of the groups.

This regex has the following named capture groups:

  • Group1:
    Captures one of the following in this order:
    * Weekday (by full name or three letter abbreviation)
    * Month (by name or three letter abbreviation)
    * Date (format 2/23/25, 02/01/2059, 2025/01/29, etc.)
    * Any letters before the first + or -.
  • Group2: +|-
  • Group3: number (with 1 or more digits)
  • Group4: Captures d|m|y|w.
  • extra_at_end:
    • Captures any remaining unmatched characters that follow.
  • no_partial_match_or_plus_minus:
    • Captures the entire string (or line) if no partial matches and no + or - sign in the string.

REGEX PATTERN (.NET regex flavor (C#); Flags: gmi)

^(?:(?<Group1>(?:(?:(?:mon|tue(?:s)?|wed(?:nes)?|thu(?:rs)?|fri|sat(?:ur)?|sun)(?:day)?)\b|(?:(?:jan(?:uary)?|feb(?:ruary)?|mar(?:ch)?|apr(?:il)?|may|jun(?:e)?|jul(?:y)?|aug(?:ust)?|sep(?:tember)?|nov(?:ember)|dec(?:ember)?)[\p{Zs}\t]*(?:(?:\d{1,2}[\p{Zs}\t]*)?\d{2,4})?)|(?:\d{1,4}/\d{1,2}/\d{1,4})|(?:[^+\n-]+(?=[+-]))))[\p{Zs}\t]*(?<Group2>[+-])?[\p{Zs}\t]*(?:(?:(?<Group3>\d+)?[\p{Zs}\t]*(?<Group4>(?:d|m|y|w)\b)?[\p{Zs}\t]*)(?<extra_at_end>[^\n]+)?))[\p{Zs}\t]*$|^(?<no_partial_match_or_plus_minus>[^\n]+)$

Regex demo: https://regex101/r/U9MAoZ/9

REGEX NOTES:

  • ^ Match beginning of string (or start of line with multiline (m) flag)

  • (?: Begin FIRST LARGE non-capture group, (?:...). This groups any full or partial matches.

    • (?<Group1> Begin named capture group (?<group-name-here>...). This group will capture weekday; OR month followed by a one-two digit day, possibly followed by a two-four digit year; OR date (2/23/25, 02/23/2025); OR any string + or -.

      • (?: Begin SECOND LARGE non-capture group. This groups characters from the beginning of the string to the first + or -. Includes alternation ...|...|...|....
        • (?: Begin THIRD non-capture group. Groups the possible names for weekdays.
          • (?: Begin non-capture group. Groups options for weekday abbreviations, and part of the weekday names before the literal day.
            mon
            |
            tue(?:s)?
            |
            wed(?:nes)?
            |
            thu(?:rs)?
            |
            fri
            |
            sat(?:ur)?
            |
            sun
            
          • )
          • (?:
            • day Match literal day
          • )? Match group 0 or 1 times (?), (?:...)?.
        • ) End THIRD.
        • \b Match word boundary.
        • | OR
        • (?: Begin FOURTH non-capture group. Groups month name, or three letter month abbreviation, with possible 1-2 digit day, followed by optional 2-4 digit year, including spaces.
          • (?: Begin non-capture group. Include options, including abbreviations for months.
            jan(?:uary)?
            |
            feb(?:ruary)?
            |
            mar(?:ch)?
            |
            apr(?:il)?
            |
            may
            |
            jun(?:e)?
            |
            jul(?:y)?
            |
            aug(?:ust)?
            |
            sep(?:tember)?
            |
            nov(?:ember)
            |
            dec(?:ember)?
            
          • )
          • [\p{Zs}\t]* Match 0 or more (*) white spaces characters [\p{Zs}\t].
          • (?: Begin non-capture group. Groups an optional 1-2 digit day followed by optional whitespace followed by 1-4 digit day or year.
            • (?:
              • \d{1,2} Match 1 or 2 digits.
              • [\p{Zs}\t]* Match optional spaces or tabs.
            • )? Match group 0 or 1 times (?), ```(?:...)?````.
            • \d{2,4} Match possible year 2-4 digits, for a day or year.
          • )?
        • )
        • | OR
        • (?: Begin non-capture group. Match date-like strings, eg. 2/23/25, 02/01/2059, 2025/01/29, etc.
          • \d{1,4} Match 1-4 ({1,4}) digits for a date, year or month.
          • /\d{1,2} Match literal forward slash / followed by 1-2 ({1,2})digits, for a date or a month.
          • /\d{1,4} Match literal forward slash / followed by 1-4 ({1,4}) digits, for a year or date.
        • )
        • | OR
        • (?: Begin non-capture group. Groups string from the begining to the first + or -. Groups character upto first + or -.
          • [^+\n-]+ Negated character class [^...]. Match any character that is not a literal +, literal -. or newline character \n.
          • (?= Positive lookahead (?=...). [+-] Matches if index is followed by a literal + or -.
          • )
        • ) End FOURTH.
      • ) End SECOND LARGE.
    • ) End Group1 capture.

    • [\p{Zs}\t]* Match optional spaces or tabs.

    • (?<Group2> Begin Group2, named capture group (?<group-name-here>...). Matches optional literal + or -.

      • [+-]
    • )? Optional, match group 0 or 1 times (?).

    • [\p{Zs}\t]* Match optional spaces or tabs.

    • (?: Begin non-capture group.

      • (?: Begin non-capture group.

        • (?<Group3> Begin Group3, named capture group (?<group-name-here>...).

          • \d+ Match 1 or more (+) digits.
        • )? Optional, match group 0 or 1 times (?).

        • [\p{Zs}\t]* Match optional spaces or tabs.

        • (?<Group4> Begin Group4, named capture group (?<group-name-here>...).

          • (?:
            • d|m|y|w Alternation. Matches one character, a literal d, m, y, or w.
          • )
          • \b Match word boundary.
        • )? Optional, match group 0 or 1 times (?).

        • [\p{Zs}\t]* Match optional spaces or tabs.

      • )

      • (?<extra_at_end> Begin extra_at_end, named capture group (?<group-name-here>...). Captures any charcters after (d|m|y|w)\b[\p{Zs}\t]*to the end of string (or line ifm```.)

        • [^\n]+ Negated chracter class. Matches any chracter that is not a newline chracater 1 or more times
      • )?

    • )

  • ) End FIRST LARGE non-capture group.

  • [\p{Zs}\t]* Match optional spaces or tabs.

  • $ Match end of string (or line in multiline m).

  • | OR

  • ^ Match beginning of string (or line if multiline flag m).

  • (?<no_partial_match_or_plus_minus> Begin no_partial_match_or_plus_minus group, named capture group (?<group-name-here>...). This group will capture the entire string (or line), in the case we do not have any desired matches and no + or - character in the string.

    • [^\n]+ *Negated character class [^...]. Match any character that is not newline character \n 1 or more (+) times.
  • )

  • $ Match end of string (or line in multiline m).

TEST STRING:


friday+
friday+1
friday+1d
2/23/25
2/23/25+
2/23/25+1d
March 23 2024
April 23
November 25 2025 +100w
March 23 + 1
March 23 + 1d
March 23 + 1dyr
monday
friday+
wed
fri+1Y
number + 1Y
moonady
WHAT IS THIS STRING? - IT IS NOT MATCHING
This thing?! It does not match!
HELLO - WORLD
SAATURDAY + 1M
Sat + 3w Hello there!
Sat + 3 Hello there!
Sat + Hello there!
Sat Hello there!
sat +       


MATCHES AND GROUPS (see Regex Demo, link above for detail):

1-8 friday+
1-7 friday
7-8 +
9-17    friday+1
9-15    friday
15-16   +
16-17   1
18-27   friday+1d
18-24   friday
24-25   +
25-26   1
26-27   d
28-35   2/23/25
28-35   2/23/25
36-44   2/23/25+
36-43   2/23/25
43-44   +
45-55   2/23/25+1d
45-52   2/23/25
52-53   +
53-54   1
54-55   d
56-69   March 23 2024
56-69   March 23 2024
70-78   April 23
70-78   April 23
79-101  November 25 2025 +100w
79-95   November 25 2025
96-97   +
97-100  100
100-101 w
102-114 March 23 + 1
102-110 March 23
111-112 +
113-114 1
115-128 March 23 + 1d
115-123 March 23
124-125 +
126-127 1
127-128 d
129-144 March 23 + 1dyr
129-137 March 23
138-139 +
140-141 1
141-144 dyr
145-151 monday
145-151 monday
152-159 friday+
152-158 friday
158-159 +
160-163 wed
160-163 wed
164-170 fri+1Y
164-167 fri
167-168 +
168-169 1
169-170 Y
171-182 number + 1Y
171-178 number 
178-179 +
180-181 1
181-182 Y
183-190 moonady
183-190 moonady
191-232 WHAT IS THIS STRING? - IT IS NOT MATCHING
191-212 WHAT IS THIS STRING? 
212-213 -
214-232 IT IS NOT MATCHING
233-264 This thing?! It does not match!
233-264 This thing?! It does not match!
265-278 HELLO - WORLD
265-271 HELLO 
271-272 -
273-278 WORLD
279-293 SAATURDAY + 1M
279-289 SAATURDAY 
289-290 +
291-292 1
292-293 M
294-315 Sat + 3w Hello there!
294-297 Sat
298-299 +
300-301 3
301-302 w
303-315 Hello there!
316-336 Sat + 3 Hello there!
316-319 Sat
320-321 +
322-323 3
324-336 Hello there!
337-355 Sat + Hello there!
337-340 Sat
341-342 +
343-355 Hello there!
356-372 Sat Hello there!
356-359 Sat
360-372 Hello there!
373-385 sat +       
373-376 sat
377-378 +

(.*) - $1 matches anything 0..m

\s* - whitespace 0..m

(?:

([+-]) - $2 Matches + or - once

\s* - whitespace 0..m

([1-9]\d{0,2}) - $3 Matches 1-9 then any digit 0to9 0 to 2 times

([dmyw]) - $4 Matches d or m or y or w once

)

?$ - which is to the end

\>For an incomplete string I need group 1 to contain >everything up to the +|- and group 2 to be >everything else.

Zero-width positive lookahead assertion up to the [+-] then a Zero-width negative lookahead assertion after.

(.*(?=.+[+-].+))(.*(?!.+[+-].+))

发布评论

评论列表(0)

  1. 暂无评论