最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Confusing behavior of a capturing group in a positive lookbehind in a Java regex with Pattern.matcher - Stack Overflow

programmeradmin2浏览0评论

The following issue is observed only on Java and not on other regex flavors (e.g PCRE).

I have the following regex: (?:(?<=([A-Za-z\d]))|\b)(MyString). There's a capturing group on [A-Za-z\d] in the lookbehind.

And I'm trying to match (through Pattern.matcher(regex); to be precise, I'm calling replaceAll) the following string: string.MyString.

On PCRE, I will match MyString, and it will be the second group in the match. On Java, however, I will match the g in string as group 1, and MyString as group 2.

  1. Why does Java do that? To me this regex implies that a character matching [A-Za-z\d] should only be matched if it directly precedes MyString, which is not the case here.
  2. How can I avoid that and not match this g? I want to keep the capturing group in case I have to match a string like stringMyString, in which case I do need that g as group 1.

The following issue is observed only on Java and not on other regex flavors (e.g PCRE).

I have the following regex: (?:(?<=([A-Za-z\d]))|\b)(MyString). There's a capturing group on [A-Za-z\d] in the lookbehind.

And I'm trying to match (through Pattern.matcher(regex); to be precise, I'm calling replaceAll) the following string: string.MyString.

On PCRE, I will match MyString, and it will be the second group in the match. On Java, however, I will match the g in string as group 1, and MyString as group 2.

  1. Why does Java do that? To me this regex implies that a character matching [A-Za-z\d] should only be matched if it directly precedes MyString, which is not the case here.
  2. How can I avoid that and not match this g? I want to keep the capturing group in case I have to match a string like stringMyString, in which case I do need that g as group 1.
Share Improve this question asked Mar 28 at 11:40 wouldnotliketowouldnotliketo 3352 silver badges14 bronze badges 1
  • Looks like the Java regex engine does not reset Group 1 contents upon a failed match and once the match is found, the submatch is returned with the match. Looks like a bug to me, but it is probably related to regex specific Java functions. – Wiktor Stribiżew Commented Mar 28 at 11:54
Add a comment  | 

1 Answer 1

Reset to default 5

There is a line on the java.util.regex.Pattern docs

The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.

I think this line explains the behavior:

  • If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails.

The last line:

  • All captured input is discarded at the beginning of each match.

So if you have this string:

string.MyString.srting.MyString

And this regex:

(?:(?<=([tr]))|\b)(MyString)

You can see that the group 1 value is different in both matches as all captured input is discarded.

See an example on regex101

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论