I am learning the regex.But I can't understand the '\b' , match a word boundary . there have three situation,like this:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
I can't understand the third situation.for example:
var reg = /end\bend/g;
var string = 'wenkend,end,end,endend';
alert( reg.test(string) ) ; //false
The '\b' require a '\w' character at its one side , another not '\w' character at the other side . the string 'end,end' should match the rule, after the first character is string ',' , before the last character is string ',' , so why the result is error .Could you help,Thanks in advance!
============dividing line=============
With your help, I understand it. the 'end,end' match the first 'end' and have a boundary ,but the next character is ',' not 'e',so '/end\bend' is false.
In other words ,the reg '/end\bend/g' or others similar reg aren't exit forever. Thanks again
I am learning the regex.But I can't understand the '\b' , match a word boundary . there have three situation,like this:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
I can't understand the third situation.for example:
var reg = /end\bend/g;
var string = 'wenkend,end,end,endend';
alert( reg.test(string) ) ; //false
The '\b' require a '\w' character at its one side , another not '\w' character at the other side . the string 'end,end' should match the rule, after the first character is string ',' , before the last character is string ',' , so why the result is error .Could you help,Thanks in advance!
============dividing line=============
With your help, I understand it. the 'end,end' match the first 'end' and have a boundary ,but the next character is ',' not 'e',so '/end\bend' is false.
In other words ,the reg '/end\bend/g' or others similar reg aren't exit forever. Thanks again
Share Improve this question edited Oct 28, 2016 at 7:25 Anan asked Oct 28, 2016 at 5:38 AnanAnan 3382 silver badges14 bronze badges 14-
1
\b
does not capture anything – Steve Commented Oct 28, 2016 at 5:41 -
Your regex should be
/end\b,\bend/g
– Niyoko Commented Oct 28, 2016 at 5:41 -
5
\b
doesn't match a character, it matches a spot between characters, a boundary. It's impossible for there to be a word boundary there when the two characters beside the\b
are both word characters. The regex you're perhaps thinking of is/end\Wend/g
– castletheperson Commented Oct 28, 2016 at 5:41 - 2 @Anan: The ma is not a boundary. There are boundaries immediately before and after the ma, but the ma itself is not a boundary. – user2357112 Commented Oct 28, 2016 at 5:44
-
1
@Anan Right,
/end\bend/
can't match anything, because it would matchendend
, but then there isn't a word boundary between the middlede
, so the match fails. – castletheperson Commented Oct 28, 2016 at 5:58
3 Answers
Reset to default 7The \b
matches position, not a character. So this regex /end\bend/g
says that there must be string end
. Then it should be followed by not a word character, which is ,
and it matches, but the regex engine doesn't move in the string and it stays at ,
. So the next character in your regex is e
, and e
doesn't match ,
. So regexp fails. Here is step by step what happens:
-----------------
/end\bend/g, "end,end" (match)
| |
-----------------
/end\bend/g, "end,end" (both regex and string position moved - match)
| |
------------------
/end\bend/g, "end,end" (the previous match was zero-length, so only regex position moved - not match)
| |
With (most) regular expression engines, you can match, capture characters and assert positions within a string.
For the purpose of this example let's assume the string
Rogue One: A Star Wars Story
where you want to match the character o
(which is there twice, after R
and after t
). Now you want to specify the position and want to match o
s only before lowercase r
s.
You write (with a positive lookahead):
o(?=r)
Now generalize the idea of zero-width assertions where you want to look for a word character ahead while making sure there's no word character immediately behind. Herefore you could write:
(?=\w)(?<!\w)
A positive and a negative lookahead, bined. We're almost there :) You only need the same thing around (a word character behind and not a word character ahead) which is:
(?<=\w)(?!\w)
If you bine these two, you'll eventually get (see the |
in the middle):
(?:(?=\w)(?<!\w)|(?<=\w)(?!\w))
Which is equivalent to
\b
(and a lot longer). Coming back to our string, this is true for:
Rogue One: A Star Wars Story
# right before R
# right after e in Rogue
# right before O of One
# right after e of One (: is not a word character)
# and so on...
See a demo on regex101..
To conclude, you can think of
\b
as a zero-width assertion which only ensures a position within the string.
Try this Expression
/(end)\b|\b(end)/g