I'm working on my own JavaScript library to support new metacharacters and features for regular expressions, and I'd like to find a case where [^xy]
is not equivalent to (?!x).
(or more specifically (?:(?!x|y).)
).
Take the example text: "abc\n"
Say I want to emulate a Perl regex: /\A.{3}\Z/s
With the singleline flag, the JavaScript regex should be equivalent to: /^[\s\S]{3}\n*$(?!\s)/
(\A
bees ^
, .
bees [\s\S]
, \Z
bees \n*$(?!\s)
)
Now, /^.{3}$/
would fail, but /^[\s\S]{3}\n*$(?!\s)/
would capture "abcabc" (same as the Perl regex)
Since \Z
contains more than just a metacharacter, emulating [^\Z]
would seem to be more difficult.
Take the example text: "abcabc\n"
The proposed JavaScript regex for the Perl regex /.{3}[^\Za]/g
would be .{3}(?:(?!\n*$(?!\s)|a).)/g
Both will match "bcab"
So, finally, I pose the question again. Is there a case where [^xy]
is not equivalent to (?:(?!x|y).)
with such a scenario, perhaps in a more plex regular expression where a lookahead would change the scenario?
I'm working on my own JavaScript library to support new metacharacters and features for regular expressions, and I'd like to find a case where [^xy]
is not equivalent to (?!x).
(or more specifically (?:(?!x|y).)
).
Take the example text: "abc\n"
Say I want to emulate a Perl regex: /\A.{3}\Z/s
With the singleline flag, the JavaScript regex should be equivalent to: /^[\s\S]{3}\n*$(?!\s)/
(\A
bees ^
, .
bees [\s\S]
, \Z
bees \n*$(?!\s)
)
Now, /^.{3}$/
would fail, but /^[\s\S]{3}\n*$(?!\s)/
would capture "abcabc" (same as the Perl regex)
Since \Z
contains more than just a metacharacter, emulating [^\Z]
would seem to be more difficult.
Take the example text: "abcabc\n"
The proposed JavaScript regex for the Perl regex /.{3}[^\Za]/g
would be .{3}(?:(?!\n*$(?!\s)|a).)/g
Both will match "bcab"
So, finally, I pose the question again. Is there a case where [^xy]
is not equivalent to (?:(?!x|y).)
with such a scenario, perhaps in a more plex regular expression where a lookahead would change the scenario?
-
/^[\s\S]{3}\n*$(?!\s)/.exec("abcabc\n")
does not match for me, and does not giveabcabc
as you suggest – Eric Commented Jun 27, 2013 at 20:39 -
Nor does the perl regex
/\A.{3}\Z/s
match"abcabc\n"
, as you claim it does... – Eric Commented Jun 27, 2013 at 20:41 - Correct. I changed some things around and forgot to edit them. The first scenario uses the text "abc\n" and the second scenario uses the text "abcabc\n". I've made the edit to the main post. – Joey Schooley Commented Jun 27, 2013 at 21:01
-
1
[^\Z]
is not a thing, because\Z
is not a character. – Martin Ender Commented Jun 27, 2013 at 21:06
5 Answers
Reset to default 9For input string "x\na"
, the 2 regexps give different outputs, because .
doesn't match newlines.
console.log("x\na".match(/(?:(?!x|y).)/))
["a", index: 2, input: "x↵a"]
console.log("x\na".match(/[^xy]/))
["↵", index: 1, input: "x↵a"]
If you change .
to [\s\S]
, the output is identical in this case:
console.log("x\na".match(/(?:(?!x|y)[\s\S])/))
["↵", index: 1, input: "x↵a"]
I cannot think of any other case right now.
Is there a case where
[^xy]
is not equal to(?!x|y).
?
Only the one you have already described: The JS dot doesn't match newlines, and needs to be replaced with [\s\S]
.
\Z
bees\n$(?!\s)
That looks wrong. After the end of the string (\z
/$
) there never will be anything, regardless whether whitespace or not. Afaik, \Z
is a zero-width-assertion (it doesn't consume the newline(s)) and should be equivalent to
(?=\n*$)
// ^ not sure whether ? or *
Since
\Z
contains more than just a metacharacter, emulating[^\Z]
would seem to be more difficult.
What do you mean by "metacharacter"? It's a zero-width-assertion, and doesn't make much sense in a character class. I'd guess it's either a syntax error, or will be interpreted literally (unescaped) as [^Z]
.
[^xy]
will match \n
. (?!x|y).
will not match \n
by default (because .
does not match \n
)
I do not believe javascript has a "dotall" or "single-line" modifier, but with new versions of each browser hitting every couple months, I've lost track.
As the others said, you should use [\s\S]
instead of .
in the replacement. Otherwise, if you are doing that transformation just via the literal strings, there are a few more things to take care of. In particular, meta characters and escape sequences:
[^*)] => (?!\*|\))[\s\S]
But I guess you'll need to take care of parsing and writing meta-characters specially anyway.
The trickiest one is probably \b
though, because it's a character (backspace) in character classes and a word boundary outside. So in the replacement, you'd have to go with an octal or hexadecimal escape:
[^a\b] => (?!a|\10)[\s\S]
or => (?!a|\x08)[\s\S]
Other than that, the two should always be equivalent.
A case where the format [^xy]
is not the same as (?:(?!x|y).)
would be where x was a zero width assertion rather then an actual character like:
Given this sample text: ab-yz
Regex: [^\by]
Example: http://www.rubular./r/ERKrqyeAs9
Returns:
[0] => a
[1] => b
[2] => -
[3] => z
Whereas
Regex: (?:(?!\b|y).)
example: http://www.rubular./r/V5RdyQEQo5
Returns:
[0] => b
[1] => z
Other non equivalent expressions, these largely focus on the fact that same syntax has different meenings inside or outside the character class:
[^^y]
yields a,b,-,z is not equal to(?:(?!^|y).)
yields b,-,z[^.y]
yields a,b,-,z is not equal to(?:(?!.|y).)
yields nothing
Or you could try this in unicode nugget in Perl: http://ideone./2xMfkQ
print "\ncapture\n";
@m = ("ss" =~ m/^(?:(?!\xDF|y).)+$/ui );
print for @m;
print "\nclass\n";
@m = ("ss" =~ m/^[^\xDFy]+$/ui) ;
print for @m;
Yields:
capture
class
1