/\b(keyword|whatever)\b/gi
How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?
First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.
/\b(keyword|whatever)\b/gi
How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?
First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.
Share Improve this question edited Apr 13, 2012 at 16:18 userBG asked Apr 13, 2012 at 15:43 userBGuserBG 7,18010 gold badges33 silver badges39 bronze badges 2- If you want to find the first occurrence of "keyword" and the first occurrence of "whatever" you are probably best off with 2 regexes. "Greedy" and "non-greedy" refer to matching wildcards like ".". – David Gorsline Commented Apr 13, 2012 at 15:53
- @DavidGorsline That's what I want to do but there maybe an indefinite number of words, not just two. – userBG Commented Apr 13, 2012 at 16:16
4 Answers
Reset to default 8Remove g
flag from your regex:
/\b(keyword|whatever)\b/i
What you're doing is simply unachievable with a singular regular expression. Instead you will have to store every word you wish to find in an array, loop through them all searching for an answer, and then for any matches, store the result in an array.
Example:
var words = ["keyword","whatever"];
var text = "Whatever, keywords are like so, whatever... Unrelated, I now know " +
"what it's like to be a tweenage girl. Go Edward.";
var matches = []; // An empty array to store results in.
/* When you search the text you need to convert it to lower case to make it
searchable.
* We'll be using the built in method 'String.indexOf(needle)' to match
the strings as it avoids the need to escape the input for regular expression
metacharacters. */
//Text converted to lower case to allow case insensitive searchable.
var lowerCaseText = text.toLowerCase();
for (var i=0;i<words.length;i++) { //Loop through the `words` array
//indexOf returns -1 if no match is found
if (lowerCaseText.indexOf(words[i]) != -1)
matches.push(words[i]); //Add to the `matches` array
}
Remove the g modifier from your regex. Then it will find only one match.
What you're talking about can't be done with a JavaScript regex. It might be possible with advanced regex features like .NET's unrestricted lookbehind, but JavaScript's feature set is extremely limited. And even in .NET, it would probably be simplest to create a separate regex for each word and apply them one by one; in JavaScript it's your only option.
Greediness only applies to regexes that employ quantifiers, like /START.*END/
. The .
means "any character" and the *
means "zero or more". After the START
is located, the .*
greedily consumes the rest of the text. Then it starts backtracking, "giving back" one character at a time until the next part of the regex, END
succeeds in matching.
We call this regex "greedy" because it matches everything from the first occurrence of START
to the last occurrence of END
.
If there may be more than one "START"-to-"END" sequence, and you want to match just the first one, you can append a ?
to the *
to make it non-greedy: /START.*?END/
. Now, each time the .
tries to consume the next character, it first checks to see if it could match END
at that spot instead. Thus it matches from the first START
to the first END
after that. And if you want to match all the "START"-to-"END" sequences individually, you add the 'g' modifier: /START.*?END/g
.
It's a bit more complicated than that, of course. For example, what if these sequences can be nested, as in START…START…END…END
? If I've gotten a little carried away with this answer, it's because understanding greediness is the first important step to mastering regexes. :-/