最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

android - RegEx fails - Looking for subtitle missing quotes - Stack Overflow

programmeradmin4浏览0评论

I have this block of text. It comes from a subtitle file.

1[p]00:06:48,564 --> 00:06:50,814[p]Chúng ta đâu cần bận tâm vì bị đuổi khỏi trường.[pp]2[p]00:06:50,864 --> 00:06:53,914[p]Chiến tranh có thể xảy ra bất cứ lúc nào. Và rồi chúng ta cũng sẽ" phải rời trường thôi.[pp]3[p]00:06:53,954 --> 00:06:55,794[p]Chiến tranh'! Không tuyệt sao, Scarlett?[pp]4[p]00:06:55,844 --> 00:06:57,764[p]Cậu biết không bọn miền Bắc thực sự muốn chiến tranh?[pp]5[p]00:06:57,824 --> 00:07:00,104[p]- Ta sẽ cho b'ọn chúng biết tay.[n][-] Fiddle-dee-dee![pp]6[p]00:07:00,134 --> 00:07:01,544[p]Chiến tranh, "lúc nào" cũng chiến tranh![pp]7[p]00:07:01,584 --> 00:07:04,524[p]Chuyện chiến" tranh "vớ vẩn làm hỏng hết các cuộc vui trong suốt mùa xuân này.[pp]

In the text above, the text between [p] and [pp] is the subtitle line of the file. I want to use regex to match the text between a [p] and [pp] that contain one quote character, in other word I want to find subtitle line that have a missing quote. I have built this RegEx construct. I used it with the search function in the QuickEdit app for Android but it has problem.

(?<=\[p\])(?!(?:\d{2}\:\d{2}\:\d{2},\d{3} --> \d{2}\:\d{2}\:\d{2},\d{3}))([^\"]+?\"[^\"]+?)(?=\[pp\])

My question is, why does my RegEx construct above not only select the correct text section that contain one quote character but include the [pp] string and text line from the previous one too. Do you know how to fix the problem. Thank you.

I have this block of text. It comes from a subtitle file.

1[p]00:06:48,564 --> 00:06:50,814[p]Chúng ta đâu cần bận tâm vì bị đuổi khỏi trường.[pp]2[p]00:06:50,864 --> 00:06:53,914[p]Chiến tranh có thể xảy ra bất cứ lúc nào. Và rồi chúng ta cũng sẽ" phải rời trường thôi.[pp]3[p]00:06:53,954 --> 00:06:55,794[p]Chiến tranh'! Không tuyệt sao, Scarlett?[pp]4[p]00:06:55,844 --> 00:06:57,764[p]Cậu biết không bọn miền Bắc thực sự muốn chiến tranh?[pp]5[p]00:06:57,824 --> 00:07:00,104[p]- Ta sẽ cho b'ọn chúng biết tay.[n][-] Fiddle-dee-dee![pp]6[p]00:07:00,134 --> 00:07:01,544[p]Chiến tranh, "lúc nào" cũng chiến tranh![pp]7[p]00:07:01,584 --> 00:07:04,524[p]Chuyện chiến" tranh "vớ vẩn làm hỏng hết các cuộc vui trong suốt mùa xuân này.[pp]

In the text above, the text between [p] and [pp] is the subtitle line of the file. I want to use regex to match the text between a [p] and [pp] that contain one quote character, in other word I want to find subtitle line that have a missing quote. I have built this RegEx construct. I used it with the search function in the QuickEdit app for Android but it has problem.

(?<=\[p\])(?!(?:\d{2}\:\d{2}\:\d{2},\d{3} --> \d{2}\:\d{2}\:\d{2},\d{3}))([^\"]+?\"[^\"]+?)(?=\[pp\])

My question is, why does my RegEx construct above not only select the correct text section that contain one quote character but include the [pp] string and text line from the previous one too. Do you know how to fix the problem. Thank you.

Share Improve this question asked Feb 17 at 2:42 MaydayUniversalMaydayUniversal 691 silver badge7 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

The pattern [^\"]+? does not exclude [p] and [pp], so patterns like

(?<=\[p\])[^\"]+?(?=\[pp\])
          ^^^^^^^ <- potentially captures [p] or [pp]

is not guarateed to only capture one section of [p]...[pp].

To fix it, you might want to replace [^\"]+? to exclude them with a negative lookahead:

(?:(?!\[pp?\])[^\"])+?

Before it matches every [^\"], it also makes sure the following sequence is not \[pp?\].

Here's the full regex

(?<=\[p\])(?!(?:\d{2}\:\d{2}\:\d{2},\d{3} --> \d{2}\:\d{2}\:\d{2},\d{3}))((?:(?!\[pp?\])[^\"])+?\"(?:(?!\[pp?\])[^\"])+?)(?=\[pp\])

Check the test case

发布评论

评论列表(0)

  1. 暂无评论