最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How to match balanced delimiters in javascript regex? - Stack Overflow

programmeradmin3浏览0评论

I would have thought this problem to be impossible; as far as I know, Javascript's regex flavor has niether recursive interpolation nor the nifty .NET balancing groups feature. Yet there it is, as problem 12 on regex.alf.nu: match balanced pairs of < and >. Unless there's some other pattern in the sets I'm not getting.

So... is this possible? If so, how?

NOTES:

  1. I know that this is impossible for true regular expressions, but based on the challenge it seems that it must be possible in Javascript's flavor (which is at least irregular enough to have backreferences). I just don't know of any feature that would let them do this.

  2. No other code - the form allows entry of a single regex, which is evaluated against the test strings on the page. I could try to crack the page to break out of the regex and into raw JS, I suppose, but that doesn't seem to be in the spirit of this challenge.

Since David asked, here are the test strings. Longer ones have been truncated with a character count, but the problem is entitled "Balance" and the ones that are plete certainly support the hypothesis that the "match" column has balanced pairs of < and > while the "not" column doesn't.

Match all of these…

<<<<<>><<>>><<... [62 chars]
<<<<<>><>><<><... [110 chars]
<<<<<>><>><>><... [102 chars]
<<<<<>><>>>><<... [88 chars]
<<<<<>>><<<>><... [58 chars]
<<<<<>>><<><>>... [152 chars]
<<<<<>>><<>><<... [42 chars]
<<<<<>>><>><<<>>>><<>>
<<<<<>>>><<<<>... [102 chars]
<<<<<>>>><<<><... [30 chars]
<<<<<>>>><><<<... [66 chars]
<<<<<>>>><><<<... [124 chars]
<<<<<>>>><>><<>>
<<<<><<>>><<<>... [34 chars]
<<<<>><<<>>>><... [92 chars]
<<<<>>><<<<>><>><<<>>>>>
<<<<>>><<<><<>>><><<>>>><<>>
<<<<>>><<><<<>... [84 chars]
<<<<>>>><<<><<... [52 chars]
<<<><<<>>>><<<... [50 chars]
<<<><<><>>>>
<<<><>><<<>>>>
<<<>><<<><<>>>... [44 chars]
<<<>><><<<><>>... [48 chars]
<<<>>><<><<<<>>>><<><<<>>>>>
<<><<<<>><>>>>... [60 chars]
<<>>
<<>><<<<<>>>>>... [54 chars]
<<>><<<<>><<<>... [74 chars]
<>
<><>

and none of these…

<
<<<<<<>>><<><>>>>>><<>
<<<<<>>><>>><<<>>>><>>
<<<<<>>>>>>
<<<<>><<<<<><<>><><<<<
<<<>><<<<><><><><
<<<>>>><><<<><>
<<><<<<><<><<>>><<
<<><<<>>>>><<
<<>>><<<>>
<><<<>><<>>><<>
<><<>>><<<><>><<<>>><<>>>><
<><<>>><><<<>
<><>><>>><><<<... [36 chars]
<>><><<<><>
<>>>>>><<<>><<>><><
<>>>>>>><<<
>
><
><<<>><><<<><<
><<<>>>><><<<<><>>><<><><<
><<><<<<><<<<>>>><
><><><<<>>>>>
><><>>><>><>
><><>>>><>>>>>>><>>><>>
><>><<<<<>>
><>><><><<>><<>>><<
><>>><>>>>><<><<<><>><>><<<
>><<<><<<<<<><>><<
>><>>><<<><>>><><<>><<><><<
>>>><>><>>>><>>><>><><
>>>>><<<>>>

I would have thought this problem to be impossible; as far as I know, Javascript's regex flavor has niether recursive interpolation nor the nifty .NET balancing groups feature. Yet there it is, as problem 12 on regex.alf.nu: match balanced pairs of < and >. Unless there's some other pattern in the sets I'm not getting.

So... is this possible? If so, how?

NOTES:

  1. I know that this is impossible for true regular expressions, but based on the challenge it seems that it must be possible in Javascript's flavor (which is at least irregular enough to have backreferences). I just don't know of any feature that would let them do this.

  2. No other code - the form allows entry of a single regex, which is evaluated against the test strings on the page. I could try to crack the page to break out of the regex and into raw JS, I suppose, but that doesn't seem to be in the spirit of this challenge.

Since David asked, here are the test strings. Longer ones have been truncated with a character count, but the problem is entitled "Balance" and the ones that are plete certainly support the hypothesis that the "match" column has balanced pairs of < and > while the "not" column doesn't.

Match all of these…

<<<<<>><<>>><<... [62 chars]
<<<<<>><>><<><... [110 chars]
<<<<<>><>><>><... [102 chars]
<<<<<>><>>>><<... [88 chars]
<<<<<>>><<<>><... [58 chars]
<<<<<>>><<><>>... [152 chars]
<<<<<>>><<>><<... [42 chars]
<<<<<>>><>><<<>>>><<>>
<<<<<>>>><<<<>... [102 chars]
<<<<<>>>><<<><... [30 chars]
<<<<<>>>><><<<... [66 chars]
<<<<<>>>><><<<... [124 chars]
<<<<<>>>><>><<>>
<<<<><<>>><<<>... [34 chars]
<<<<>><<<>>>><... [92 chars]
<<<<>>><<<<>><>><<<>>>>>
<<<<>>><<<><<>>><><<>>>><<>>
<<<<>>><<><<<>... [84 chars]
<<<<>>>><<<><<... [52 chars]
<<<><<<>>>><<<... [50 chars]
<<<><<><>>>>
<<<><>><<<>>>>
<<<>><<<><<>>>... [44 chars]
<<<>><><<<><>>... [48 chars]
<<<>>><<><<<<>>>><<><<<>>>>>
<<><<<<>><>>>>... [60 chars]
<<>>
<<>><<<<<>>>>>... [54 chars]
<<>><<<<>><<<>... [74 chars]
<>
<><>

and none of these…

<
<<<<<<>>><<><>>>>>><<>
<<<<<>>><>>><<<>>>><>>
<<<<<>>>>>>
<<<<>><<<<<><<>><><<<<
<<<>><<<<><><><><
<<<>>>><><<<><>
<<><<<<><<><<>>><<
<<><<<>>>>><<
<<>>><<<>>
<><<<>><<>>><<>
<><<>>><<<><>><<<>>><<>>>><
<><<>>><><<<>
<><>><>>><><<<... [36 chars]
<>><><<<><>
<>>>>>><<<>><<>><><
<>>>>>>><<<
>
><
><<<>><><<<><<
><<<>>>><><<<<><>>><<><><<
><<><<<<><<<<>>>><
><><><<<>>>>>
><><>>><>><>
><><>>>><>>>>>>><>>><>>
><>><<<<<>>
><>><><><<>><<>>><<
><>>><>>>>><<><<<><>><>><<<
>><<<><<<<<<><>><<
>><>>><<<><>>><><<>><<><><<
>>>><>><>>>><>>><>><><
>>>>><<<>>>
Share edited Dec 22, 2013 at 6:35 Mark Reed asked Dec 22, 2013 at 6:13 Mark ReedMark Reed 95.5k17 gold badges145 silver badges188 bronze badges 6
  • 1 Are you allowed to use other JavaScript code? If so, just implement a stack of 'start' matches then unwind them as you find the 'end' matches. – David-SkyMesh Commented Dec 22, 2013 at 6:19
  • 1 The language you describe is not regular. – 000 Commented Dec 22, 2013 at 6:27
  • How about you paste the set of [matche these] & [but not these] into your question? – David-SkyMesh Commented Dec 22, 2013 at 6:30
  • 1 @JoeFrambach: True, but JavaScript "regular expressions", despite the name, do allow you to express certain non-regular languages. For example, the language given by /^(a+)b\1$/ is not regular. – ruakh Commented Dec 22, 2013 at 6:31
  • For those of us not familiar with the challenge, can you fix up the example to not include the [] stuff that isn't supposed to get matched? – brandonscript Commented Dec 22, 2013 at 6:43
 |  Show 1 more ment

1 Answer 1

Reset to default 7

I do not believe that this is possible in JavaScript, though it's hard to prove. For example, Java and PHP do not have the features that you mention (recursive interpolation, balancing groups), but this fascinating Stack Overflow answer shows how to match anbn using regexes in those languages. (Adapting that answer to the present case, the Java regex ^(?:(?:<(?=<*(\1?+>)))+\1)*$ should work. Correction: no, it's not that easily adapted.) But that answer depends on Java's support for the possessive quantifier ?+ (like ? except that you can't backtrack into it), and JavaScript doesn't have that.

That said, you can solve the referenced puzzle by writing this:

^(?:<(?:<(?:<(?:<(?:<(?:<(?:<>)*>)*>)*>)*>)*>)*>)*$

which matches up to seven levels of nesting. That's the most that any of the strings has, so it's all you need. (Several of the other puzzles at that page advise you to cheat because they're asking for something technically impossible; so while an elegant solution would obviously be more appealing, there's no reason to assume that one exists.)

发布评论

评论列表(0)

  1. 暂无评论