I was doing some tests in Javascript with the replace javascript function.
Consider the following examples executed on a node REPL.
It's a replace
that deletes spaces, hyphens and underscores from a string.
> "call this 9344 5 66 22".replace(/[ _-]/g, '');
'callthis934456622'
That was what I was expecting. To only delete the spaces.
However take a look at this:
> "call this 9344 5 66 22".replace(/[ -_]/g, '');
'callthis'
Why when I put this regex bination exact like this -_
(space, hyphen, underscore) it deletes the numbers in the string?
More tests I did:
-
(space, hyphen) does not deletes numbers
_
(space, underscore) does not deletes numbers
_-
(space, underscore, hyphen) does not deletes numbers
-_
(hyphen, underscore, space) does not deletes numbers
_-
(underscore, hyphen, space) REPL blocks??
-_
(space, hyphen, underscore) does deletes numbers
I was doing some tests in Javascript with the replace javascript function.
Consider the following examples executed on a node REPL.
It's a replace
that deletes spaces, hyphens and underscores from a string.
> "call this 9344 5 66 22".replace(/[ _-]/g, '');
'callthis934456622'
That was what I was expecting. To only delete the spaces.
However take a look at this:
> "call this 9344 5 66 22".replace(/[ -_]/g, '');
'callthis'
Why when I put this regex bination exact like this -_
(space, hyphen, underscore) it deletes the numbers in the string?
More tests I did:
-
(space, hyphen) does not deletes numbers
_
(space, underscore) does not deletes numbers
_-
(space, underscore, hyphen) does not deletes numbers
-_
(hyphen, underscore, space) does not deletes numbers
_-
(underscore, hyphen, space) REPL blocks??
-_
(space, hyphen, underscore) does deletes numbers
5 Answers
Reset to default 14[ -_]
means characters from space
(ASCII 32) to _
(ASCII 95) which includes, among other things, numbers and capital letters.
What you are looking for is [ \-_]
. Escaping the -
will make it act like the character instead of the meta-character for ranges.
Hyphen if not present at start or end position in a character class needs to be escaped otherwise it represents a range
.
So this regex:
[ -_]
will match anything from space to underscore i.e. ASCII 32-95
The -
character has special meaning in character classes. When it appears between two characters, it represents a character range — e.g. [a-z]
matches any character with a character code between a
and z
, inclusive.
However, as you've observed, when it's placed at the beginning or end of the character class, it just represents a literal -
character. This can also be acplished by escaping the -
within the character class — i.e. [ \-_]
.
"call this 9344 5 66 22".replace(/(\s|-|_)/g, '');
In a class, the dash - character has special meaning as a range operator ONLY when
it doesn't separate clauses, parsed left to right.
Otherwise it is considered no different than any other literal.
Regular expression parsers have no time to worry about good form.
So you can put the dash anywhere you want as a literal, as long as it separates clauses (i.e. its not ambigous).
Most people put it at the end or beginning or escape it so no conceptual errors occur.
Example of clauses, which are hilighted, and literal dashes:
[-a-z
-\p{L}
-0-9
-\x00-\x09
-\x20-]