最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Why is [w-+] a valid regex but [w-+]u invalid? - Stack Overflow

programmeradmin3浏览0评论

If I type /[\w-+]/ in the Chrome console, it accepts it. I get a regex object I can use to test strings as usual. But if I type /[\w-+]/u, it says VM112:1 Uncaught SyntaxError: Invalid regular expression: /[\w-+]/: Invalid character class.

In Firefox, /[\w-+]/ works fine, but if I type /[\w-+]/u in the console, it just goes to the next line as if I typed an inplete statement. If I try to force it to create the regex by running eval('/[\w-+]/u'), it tells me SyntaxError: invalid range in character class.

Why does the u flag make the regex invalid? The MDN RegExp documentation says u enables some Unicode features, but I don't see anything about how it affects ranges in character classes.

If I type /[\w-+]/ in the Chrome console, it accepts it. I get a regex object I can use to test strings as usual. But if I type /[\w-+]/u, it says VM112:1 Uncaught SyntaxError: Invalid regular expression: /[\w-+]/: Invalid character class.

In Firefox, /[\w-+]/ works fine, but if I type /[\w-+]/u in the console, it just goes to the next line as if I typed an inplete statement. If I try to force it to create the regex by running eval('/[\w-+]/u'), it tells me SyntaxError: invalid range in character class.

Why does the u flag make the regex invalid? The MDN RegExp documentation says u enables some Unicode features, but I don't see anything about how it affects ranges in character classes.

Share Improve this question asked Jan 15, 2019 at 18:55 Elias ZamariaElias Zamaria 101k34 gold badges120 silver badges151 bronze badges 16
  • 2 u modifier makes the regex engine parse the regex expression in a more strict way. All chars that do not have to be escaped must not be escaped and those that should must be escaped. All ambiguity must be avoided. – Wiktor Stribiżew Commented Jan 15, 2019 at 19:11
  • 2 Okay, so ECMA-262, page 570, note 3, says that "a - character can be treated literally or it can denote a range. It is treated literally if it is the first or last character of ClassRanges, the beginning or end limit of a range specification, or immediately follows a range specification". – Blackhole Commented Jan 15, 2019 at 19:23
  • 5 And: ClassRanges can expand into a single ClassAtom and/or ranges of two ClassAtom separated by dashes. In the latter case the ClassRanges includes all characters between the first ClassAtom and the second ClassAtom, inclusive; an error occurs if either ClassAtom does not represent a single character (for example, if one is \w) or if the first ClassAtom's character value is greater than the second ClassAtom's character value. (link) – Wiktor Stribiżew Commented Jan 15, 2019 at 19:28
  • 1 @WiktorStribiżew, your quote seems to explain why the regex causes an error. But I don't see anything about why the error only happens with a u flag. – Elias Zamaria Commented Jan 15, 2019 at 19:33
  • 1 I hope Mathias Bynens will drop in to share his thoughts. – Wiktor Stribiżew Commented Jan 15, 2019 at 19:54
 |  Show 11 more ments

2 Answers 2

Reset to default 8

Within a RegExp character set, a hyphen-minus character (your standard keyboard dash) denotes a range of character codes between the two characters it separates. The exceptions are when it is escaped (\-) or when it does not separate two characters because it is either the final character of the class or it is the first character (after the optional caret that inverts the class).

Three examples of character ranges: a simple example, an advanced example, and a bug:

  • [a-z] is pretty straightforward because it works the way we expect it to, though this is actually because the character codes happen to be sequential. Another way of writing this is [\x61-\x7a]
  • [!-~] is not at all straightforward, at least until you look at a character map and learn that ! is the first printable ASCII character and ~ is the last (of "lower ASCII"), so this is a way of saying "all printable lower ASCII characters" and it is the equivalent of [\x21-\x7e]
  • [A-z] has a switched case in it. You may dislike the fact that there are six non-letter characters accepted by this range (which is [\x41-\x7a])


Now let's examine your regex of /[\w-+]/u. Regex101 has a more informative error:

You can not create a range with shorthand escape sequences

Since \w is not itself a character (but rather a collection of characters), an abutting dash must either be taken literally or else an error. When you invoke it with the /u flag to trigger fullUnicode, you enter a more strict mode and therefore get an error.

The error I get from "foo".match(/[\w-+]/u) in Firefox 64.0 is:

SyntaxError: character class escape cannot be used in class range in regular expression

This is slightly more informative than the error you got since it actually tells you the problem is with the escape (though not why it's a problem).

According to ECMAScript 2015's RegExBuiltinExec() logic:

  1. If fullUnicode is true, then
  2. e is an index into the Input character list, derived from S, matched by matcher. Let eUTF be the smallest index into S that corresponds to the character at element e of Input. If e is greater than or equal to the length of Input, then eUTF is the number of code units in S.
  3. Let e be eUTF.

This seems to be explicitly building its own range-parsing logic.


The solution is to either escape your hyphen-minus or else put it last (or first):

/[\w\-+]/u or /[\w+-]/u or /[-\w+]/u. I personally always put it last.

There is a report for this: V8 implementation: does unicode property escapes behavior in character classes range differ from other classes intentionally?.


I took a look at V8 source code (regexp-parser) and found this:

if (is_class_1 || is_class_2) {
    // Either end is an escaped character class. Treat the '-' verbatim.
    if (unicode()) {
       // ES2015 21.2.2.15.1 step 1.
       return ReportError(CStrVector(kRangeInvalid));
    }

kRangeInvalid is a constant that holds Invalid character class.

21.2.2.15.1 step 1.

If A does not contain exactly one character or B does not contain exactly one character, throw a SyntaxError exception.

发布评论

评论列表(0)

  1. 暂无评论