最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - How to make regex match only first occurrence of each match? - Stack Overflow

programmeradmin5浏览0评论
/\b(keyword|whatever)\b/gi

How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?

First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.

/\b(keyword|whatever)\b/gi

How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?

First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.

Share Improve this question edited Apr 13, 2012 at 16:18 userBG asked Apr 13, 2012 at 15:43 userBGuserBG 7,18010 gold badges33 silver badges39 bronze badges 2
  • If you want to find the first occurrence of "keyword" and the first occurrence of "whatever" you are probably best off with 2 regexes. "Greedy" and "non-greedy" refer to matching wildcards like ".". – David Gorsline Commented Apr 13, 2012 at 15:53
  • @DavidGorsline That's what I want to do but there maybe an indefinite number of words, not just two. – userBG Commented Apr 13, 2012 at 16:16
Add a comment  | 

4 Answers 4

Reset to default 8

Remove g flag from your regex:

/\b(keyword|whatever)\b/i

What you're doing is simply unachievable with a singular regular expression. Instead you will have to store every word you wish to find in an array, loop through them all searching for an answer, and then for any matches, store the result in an array.

Example:

var words = ["keyword","whatever"];
var text = "Whatever, keywords are like so, whatever... Unrelated, I now know " +
           "what it's like to be a tweenage girl. Go Edward.";
var matches = []; // An empty array to store results in.
/* When you search the text you need to convert it to lower case to make it
   searchable.
 * We'll be using the built in method 'String.indexOf(needle)' to match 
   the strings as it avoids the need to escape the input for regular expression
   metacharacters. */

//Text converted to lower case to allow case insensitive searchable.
var lowerCaseText = text.toLowerCase();
for (var i=0;i<words.length;i++) { //Loop through the `words` array
    //indexOf returns -1 if no match is found
    if (lowerCaseText.indexOf(words[i]) != -1) 
        matches.push(words[i]);    //Add to the `matches` array
}

Remove the g modifier from your regex. Then it will find only one match.

What you're talking about can't be done with a JavaScript regex. It might be possible with advanced regex features like .NET's unrestricted lookbehind, but JavaScript's feature set is extremely limited. And even in .NET, it would probably be simplest to create a separate regex for each word and apply them one by one; in JavaScript it's your only option.

Greediness only applies to regexes that employ quantifiers, like /START.*END/. The . means "any character" and the * means "zero or more". After the START is located, the .* greedily consumes the rest of the text. Then it starts backtracking, "giving back" one character at a time until the next part of the regex, END succeeds in matching.
We call this regex "greedy" because it matches everything from the first occurrence of START to the last occurrence of END.

If there may be more than one "START"-to-"END" sequence, and you want to match just the first one, you can append a ? to the * to make it non-greedy: /START.*?END/. Now, each time the . tries to consume the next character, it first checks to see if it could match END at that spot instead. Thus it matches from the first START to the first END after that. And if you want to match all the "START"-to-"END" sequences individually, you add the 'g' modifier: /START.*?END/g.

It's a bit more complicated than that, of course. For example, what if these sequences can be nested, as in START…START…END…END? If I've gotten a little carried away with this answer, it's because understanding greediness is the first important step to mastering regexes. :-/

发布评论

评论列表(0)

  1. 暂无评论