I'm trying to lazy match a group in JavaScript, but what I have it not working quite as I'd expect.
"/1000/2000/".match("(?:/)(.*?)(?:/)$")
This is what I have and what I believe this regexp will do:
- Group match (and ignore) the
/
character - Group match anything between two
/
characters, but the shortest match - Group match (and ignore) the
/
character - Match the end of the string
That should then return me 2000
, but it's returning 1000/2000
. Why is that?
I'm trying to lazy match a group in JavaScript, but what I have it not working quite as I'd expect.
"/1000/2000/".match("(?:/)(.*?)(?:/)$")
This is what I have and what I believe this regexp will do:
- Group match (and ignore) the
/
character - Group match anything between two
/
characters, but the shortest match - Group match (and ignore) the
/
character - Match the end of the string
That should then return me 2000
, but it's returning 1000/2000
. Why is that?
- this doesn't look like a meaningful matching, what are you actually trying to do? Or are you just curious about why the pattern matches the way it does? – Mike 'Pomax' Kamermans Commented Apr 20, 2015 at 23:51
-
@Mike'Pomax'Kamermans I'm trying to get the last
something
between/
, right before the end of the string. – alexandernst Commented Apr 20, 2015 at 23:52 - right, but that strongly sounds like an XY problem, where you're asking about "something, between slashes", but only because you have a problem and you thought of a solution and how you're trying to get help with that solution, instead of with the original problem – Mike 'Pomax' Kamermans Commented Apr 20, 2015 at 23:54
- While I'm open to suggestions to my initial proble, I'd really like to know why the lazy group matching in my regexp isn't working. @Mike'Pomax'Kamermans – alexandernst Commented Apr 20, 2015 at 23:55
4 Answers
Reset to default 4When matching a string against a regex, the engine will try every position from left to right until a match is found.
Since the string is scanned from left to right, (?:/)(.*?)(?:/)$
can find a match at index 0 of the input string /1000/2000/
.
The lazy quantifier only affect the order the repetition is tried. It will try empty string, then repeat once, twice, 3 times, etc. Since .
matches anything except for line terminators, and the string is tried from left to right, the whole /1000/2000/
is matched.
By the way, while it's usually said that .*?
matches the least number of character possible, the correct definition is that lazy quantifier will try expanding the atom (in this case is .
) the least number of times possible, so that the sequel (in this case is (?:/)$
) can be matched.
The solution, as mentioned in other answers, is to limit the set of allowed characters in between /
by replacing .
with [^/]
. After the character class is changed, you can use either greedy or lazy quantifier, since the grammar has bee unambiguous, thus the search order doesn't affect the final result.
(?:)
is a non-capturing group - it still includes the contents in the match but does not create a group match for the ()
brackets.
Breaking down you regular expression:
(?:/)
will match the first slash in the string (but the brackets do not create a group.)(.*?)
will match zero-or-more of any character until the first match of the subsequent part of the pattern (and the brackets create a separate capturing group)(?:/)$
will match a slash followed immediately by the end-of-the-string (and the brackets do not create a group).
So the first part will match the first character and the last part will match the last character and the middle bit will match as much as it needs to fulfil the other matches (i.e. everything in between).
As an alternative, this will match the last character string between two slashes where the last slash is at the end of the word:
"/1000/2000/".match("[^/]*(?=/$)")
the ?:X
pattern is a match but do not capture instruction in JavaScript, so we see the following pattern:
(?:/)(.*?)(?:/)$
translating to:
- (?:/) match
/
(somewhere), but do not capture - (.*?) match as many characters as the rest of the pattern allows
- (?:/)$ match
/
followed by end of string, but do not capture
So, the first /
is matched and prompty forgotten, then we match set (2) which tries a non-greedy match for "any character" that is followed by (?:/)$
. Since the final part matches only the slash at the end of your input string, we find and ignore the first and last /
, which leaves us with 1000/2000
.
If you wanted 1000
instead, then there isn't really a reason to bother with regexp at all:
// get some input
var s = "/1000/2000/";
// split on slashes
var t = s.split('/');
// filter out empties
t = t.filter(function(a) { return !!a ; });
// convert to ints, because why not. Note that even regexp will
// yield strings, so you still have to do this if you do use regexp.
t = t.mapfunction(a) { return parseInt(a,10); });
// results are....
console.log(t.join(", ")); // => "1000, 2000"
If you're looking for "things between slashes", just look for things that are not slashes:
"/1000/2000/".match(/([^\/]+)/g) // => Array [ "1000", "2000" ]
Try this but it is not really the most elegant solution out there:
'/1000/2000/'.match(/(?!\/)\d+(?=\/$)/);