javascript - Is there a case where "[^xy]" is not equal to "(?!x|y)."?

I'm working on my own JavaScript library to support new metacharacters and features for regular expressions, and I'd like to find a case where [^xy] is not equivalent to (?!x). (or more specifically (?:(?!x|y).)).

Take the example text: "abc\n"

Say I want to emulate a Perl regex: /\A.{3}\Z/s

With the singleline flag, the JavaScript regex should be equivalent to: /^[\s\S]{3}\n*$(?!\s)/ (\A bees ^, . bees [\s\S], \Z bees \n*$(?!\s))

Now, /^.{3}$/ would fail, but /^[\s\S]{3}\n*$(?!\s)/ would capture "abcabc" (same as the Perl regex)

Since \Z contains more than just a metacharacter, emulating [^\Z] would seem to be more difficult.

Take the example text: "abcabc\n"

The proposed JavaScript regex for the Perl regex /.{3}[^\Za]/g would be .{3}(?:(?!\n*$(?!\s)|a).)/g

Both will match "bcab"

So, finally, I pose the question again. Is there a case where [^xy] is not equivalent to (?:(?!x|y).) with such a scenario, perhaps in a more plex regular expression where a lookahead would change the scenario?

Take the example text: "abc\n"

Say I want to emulate a Perl regex: /\A.{3}\Z/s

With the singleline flag, the JavaScript regex should be equivalent to: /^[\s\S]{3}\n*$(?!\s)/ (\A bees ^, . bees [\s\S], \Z bees \n*$(?!\s))

Now, /^.{3}$/ would fail, but /^[\s\S]{3}\n*$(?!\s)/ would capture "abcabc" (same as the Perl regex)

Since \Z contains more than just a metacharacter, emulating [^\Z] would seem to be more difficult.

Take the example text: "abcabc\n"

The proposed JavaScript regex for the Perl regex /.{3}[^\Za]/g would be .{3}(?:(?!\n*$(?!\s)|a).)/g

Both will match "bcab"

Share Improve this question edited Jun 27, 2013 at 21:01 asked Jun 27, 2013 at 20:31 Joey Schooley 4313 silver badges7 bronze badges

/^[\s\S]{3}\n*$(?!\s)/.exec("abcabc\n") does not match for me, and does not give abcabc as you suggest – Eric Commented Jun 27, 2013 at 20:39
Nor does the perl regex /\A.{3}\Z/s match "abcabc\n", as you claim it does... – Eric Commented Jun 27, 2013 at 20:41
Correct. I changed some things around and forgot to edit them. The first scenario uses the text "abc\n" and the second scenario uses the text "abcabc\n". I've made the edit to the main post. – Joey Schooley Commented Jun 27, 2013 at 21:01
1 [^\Z] is not a thing, because \Z is not a character. – Martin Ender Commented Jun 27, 2013 at 21:06

Add a ment |

5 Answers 5

Sorted by: Reset to default 9

For input string "x\na", the 2 regexps give different outputs, because . doesn't match newlines.

console.log("x\na".match(/(?:(?!x|y).)/))
["a", index: 2, input: "x↵a"]
console.log("x\na".match(/[^xy]/))
["↵", index: 1, input: "x↵a"]

If you change . to [\s\S], the output is identical in this case:

console.log("x\na".match(/(?:(?!x|y)[\s\S])/))
["↵", index: 1, input: "x↵a"]

I cannot think of any other case right now.

Is there a case where [^xy] is not equal to (?!x|y).?

Only the one you have already described: The JS dot doesn't match newlines, and needs to be replaced with [\s\S].

\Z bees \n$(?!\s)

That looks wrong. After the end of the string (\z/$) there never will be anything, regardless whether whitespace or not. Afaik, \Z is a zero-width-assertion (it doesn't consume the newline(s)) and should be equivalent to

(?=\n*$)
//   ^ not sure whether ? or *

Since \Z contains more than just a metacharacter, emulating [^\Z] would seem to be more difficult.

What do you mean by "metacharacter"? It's a zero-width-assertion, and doesn't make much sense in a character class. I'd guess it's either a syntax error, or will be interpreted literally (unescaped) as [^Z].

[^xy] will match \n. (?!x|y). will not match \n by default (because . does not match \n)

I do not believe javascript has a "dotall" or "single-line" modifier, but with new versions of each browser hitting every couple months, I've lost track.

As the others said, you should use [\s\S] instead of . in the replacement. Otherwise, if you are doing that transformation just via the literal strings, there are a few more things to take care of. In particular, meta characters and escape sequences:

[^*)] => (?!\*|\))[\s\S]

But I guess you'll need to take care of parsing and writing meta-characters specially anyway.

The trickiest one is probably \b though, because it's a character (backspace) in character classes and a word boundary outside. So in the replacement, you'd have to go with an octal or hexadecimal escape:

[^a\b] => (?!a|\10)[\s\S] 
    or => (?!a|\x08)[\s\S]

Other than that, the two should always be equivalent.

A case where the format [^xy] is not the same as (?:(?!x|y).) would be where x was a zero width assertion rather then an actual character like:

Given this sample text: ab-yz

Regex: [^\by] Example: http://www.rubular./r/ERKrqyeAs9

Returns:

[0] => a
[1] => b
[2] => -
[3] => z

Whereas

Regex: (?:(?!\b|y).) example: http://www.rubular./r/V5RdyQEQo5

Returns:

[0] => b
[1] => z

Other non equivalent expressions, these largely focus on the fact that same syntax has different meenings inside or outside the character class:

[^^y] yields a,b,-,z is not equal to (?:(?!^|y).) yields b,-,z
[^.y] yields a,b,-,z is not equal to (?:(?!.|y).) yields nothing

Or you could try this in unicode nugget in Perl: http://ideone./2xMfkQ

print "\ncapture\n";
@m = ("ss" =~ m/^(?:(?!\xDF|y).)+$/ui ); 
print for @m;

print "\nclass\n";
@m = ("ss" =~ m/^[^\xDFy]+$/ui) ; 
print for @m;

Yields:

capture

class
1

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Is there a case where "[^xy]" is not equal to "(?!x|y)."? - Stack Overflow

5 Answers 5

与本文相关的文章

评论列表(0)