最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - What is meaning of [_|_|.]? in Javascript regexps? - Stack Overflow

programmeradmin7浏览0评论

I have a js code:

/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/

But what's meaning of [_|\_|\.]?(js regexp)

I have a js code:

/^([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+@([a-zA-Z0-9]+[_|\_|\.]?)*[a-zA-Z0-9]+\.[a-zA-Z]{2,3}$/

But what's meaning of [_|\_|\.]?(js regexp)

Share Improve this question edited Oct 3, 2013 at 8:30 Shadow Wizard 66.4k26 gold badges146 silver badges209 bronze badges asked Oct 3, 2013 at 8:28 JackSunJackSun 1,4783 gold badges16 silver badges19 bronze badges 2
  • 8 It's nonsense, it's a character class which means match _ or | or . zero or one time. It could be shortened to [|_.]? but I doubt it is the intention of it's writer. – HamZa Commented Oct 3, 2013 at 8:31
  • 3 The regex is written terribly, and will perform poorly. For example, try it on [email protected]: regex101./r/bM1fK8 . Besides that, it doesn't support all valid domains (eg full Unicode), TLDs (eg .museum), or email names (eg [email protected]). You can find a better pattern. – Kobi Commented Oct 3, 2013 at 8:41
Add a ment  | 

4 Answers 4

Reset to default 11

If we use a resource like Regexper, we can visualise this regular expression:

From this we can conclude that [_|\_|\.] requires one of either "_", "|" or ".". We can also see that the double declaration of "_" and "|" is unnecessary. As HamZa mented, this segment can be shortened to [_|.] to achieve the same result.

In fact, we can even use resources like Regexper to visualise the entire expression.

REGEX101 is a very good tool for understanding regular expression

Char class [_|\_|\.] 0 to 1 times [greedy] matches:

[_|\_|\. One of the following characters _|_|.
 [_|\_|\.] requires one of either "_", "|" or "."

See This Link of RegEx101 here Your Expression explanation

It matches a pipe character, an underscore, or a period.
It is unnecessarily convoluted, however. It could be simpler.

It could be shortened to this
[|_.]

[_|\_|\.] is probably meant to match an underscore (_) or a period (.), and should have been written as [_.].

I'm reasonably sure the author is using the pipe (|) to mean "or" (i.e., alternation), which isn't necessary inside a character class. As the other responders said, the pipe actually matches a literal pipe, but I don't believe that was the author's intent. It's a very mon beginner's mistake.

The dot (.) is another special character that loses its special meaning when it appears in a character class. There's no need to escape it with a backslash as the author did, though it does no harm. And the underscore never has any special meaning; I won't even try to guess why the author listed it twice, once with a backslash and once without.

You didn't ask about it, but the ? doesn't belong there either. That's what makes the regex so horribly inefficient, as Kobi remarked. The idea was to match one or more alphanumerics, then optionally match a separator character (dot or underscore), which must be followed by some more alphanumerics, repeating as needed. Here's how I would write that:

[a-zA-Z0-9]+([_.][a-zA-Z0-9]+)*

If it runs out of alphanumerics and the next character is not _ or ., it skips that whole section and tries to match the next part. And if it can't do that, it can bail out immediately because no match is possible. But the way your regex is written, the separator is optional independently of the things it's supposed to separate, which makes it useless. The regex engine has to keep backing up, trying to match characters that it has already consumed in endless, pointless binations before it can give up. And that, unfortunately, is another mon mistake.

发布评论

评论列表(0)

  1. 暂无评论