I have a js code:
/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/
But what's meaning of [_|\_|\.]
?(js regexp)
I have a js code:
/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/
But what's meaning of [_|\_|\.]
?(js regexp)
-
8
It's nonsense, it's a character class which means
match _ or | or .
zero or one time. It could be shortened to[|_.]?
but I doubt it is the intention of it's writer. – HamZa Commented Oct 3, 2013 at 8:31 -
3
The regex is written terribly, and will perform poorly. For example, try it on
[email protected]
: regex101./r/bM1fK8 . Besides that, it doesn't support all valid domains (eg full Unicode), TLDs (eg .museum), or email names (eg[email protected]
). You can find a better pattern. – Kobi Commented Oct 3, 2013 at 8:41
4 Answers
Reset to default 11If we use a resource like Regexper, we can visualise this regular expression:
From this we can conclude that [_|\_|\.]
requires one of either "_", "|" or ".". We can also see that the double declaration of "_" and "|" is unnecessary. As HamZa mented, this segment can be shortened to [_|.]
to achieve the same result.
In fact, we can even use resources like Regexper to visualise the entire expression.
REGEX101 is a very good tool for understanding regular expression
Char class [_|\_|\.] 0 to 1 times [greedy] matches:
[_|\_|\. One of the following characters _|_|.
[_|\_|\.] requires one of either "_", "|" or "."
See This Link of RegEx101 here Your Expression explanation
It matches a pipe character, an underscore, or a period.
It is unnecessarily convoluted, however. It could be simpler.
It could be shortened to this
[|_.]
[_|\_|\.]
is probably meant to match an underscore (_
) or a period (.
), and should have been written as [_.]
.
I'm reasonably sure the author is using the pipe (|
) to mean "or" (i.e., alternation), which isn't necessary inside a character class. As the other responders said, the pipe actually matches a literal pipe, but I don't believe that was the author's intent. It's a very mon beginner's mistake.
The dot (.
) is another special character that loses its special meaning when it appears in a character class. There's no need to escape it with a backslash as the author did, though it does no harm. And the underscore never has any special meaning; I won't even try to guess why the author listed it twice, once with a backslash and once without.
You didn't ask about it, but the ?
doesn't belong there either. That's what makes the regex so horribly inefficient, as Kobi remarked. The idea was to match one or more alphanumerics, then optionally match a separator character (dot or underscore), which must be followed by some more alphanumerics, repeating as needed. Here's how I would write that:
[a-zA-Z0-9]+([_.][a-zA-Z0-9]+)*
If it runs out of alphanumerics and the next character is not _
or .
, it skips that whole section and tries to match the next part. And if it can't do that, it can bail out immediately because no match is possible. But the way your regex is written, the separator is optional independently of the things it's supposed to separate, which makes it useless. The regex engine has to keep backing up, trying to match characters that it has already consumed in endless, pointless binations before it can give up. And that, unfortunately, is another mon mistake.