I'm reading Ionic's source code. I came across this regex, and i"m pretty baffled by it.
([\s\S]+?)
Ok, it's grouping on every char that is either a white space, or non white space???
Why didn't they just do
(.+?)
Am I missing something?
I'm reading Ionic's source code. I came across this regex, and i"m pretty baffled by it.
([\s\S]+?)
Ok, it's grouping on every char that is either a white space, or non white space???
Why didn't they just do
(.+?)
Am I missing something?
Share Improve this question edited Nov 9, 2015 at 13:39 Mike Chamberlain 42.5k28 gold badges112 silver badges159 bronze badges asked Sep 11, 2015 at 22:29 user133688user133688 7,0643 gold badges22 silver badges37 bronze badges 1-
6
because the dot doesn't match the newline character
\n
– Casimir et Hippolyte Commented Sep 11, 2015 at 22:30
5 Answers
Reset to default 11The .
matches any symbol but a newline. In order to make it match a newline, in most languages there is a modifier (dotall, singleline). However, in JS, there is no such a modifier.
Thus, a work-around is to use a [\s\S]
character class that will match any character, including a newline, because \s
will match all whitespace and \S
will match all non-whitespace characters. Similarly, one could use [\d\D]
or [\w\W]
.
Also, there is a [^]
pattern to match the same thing in JS, but since it is JavaScript-specific, the regexes containing this pattern are not portable between regex flavors.
The +?
lazy quanitifier matches 1 or more symbols conforming to the preceding subpattern, but as few as possible. Thus, it will match just 1 symbol if used like this, at the end of the pattern.
In many realizations of Regexp "." doesn't match new lines. So they use "[\s\S]" as a little hack =)
A .
matches everything but the newline character. This is actually a well known/documented problem with javascript. The \s
(whitespace match) alongside it's negation \S
(non-whitespace match) provides a dotall
match including the newline. Thus [\s\S]
is generally used more frequently than .
The RegEx they used includes more characters (essentially everything).
\s
matches any word or digit character or whitespace.
\S
matches anything except a digit, word character, or whitespace
As Casimir notes:
.
matches any character except newline (\n
)
.
matches any char except carriage return /r
and new line /n
The Shortest way to do [/s/S]
(white space and non white space) is [^]
(not nothing)