最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - Javascript: find all occurrences of word in text document - Stack Overflow

programmeradmin6浏览0评论

I'm trying to write a Javascript function to find indices of all occurrences of a word in a text document. Currently this is what I have--

//function that finds all occurrences of string 'needle' in string 'haystack'
function getMatches(haystack, needle) {
  if(needle && haystack){
    var matches=[], ind=0, l=needle.length;
    var t = haystack.toLowerCase();
    var n = needle.toLowerCase();
    while (true) {
      ind = t.indexOf(n, ind);
      if (ind == -1) break;
      matches.push(ind);
      ind += l;
  }
  return matches;
}

However, this gives me a problem since this matches the occurrences of the word even when it's part of a string. For example, if the needle is "book" and haystack is "Tom wrote a book. The book's name is Facebook for dummies", the result is the index of 'book', 'book's' and 'Facebook', when I want only the index of 'book'. How can I acplish this? Any help is appreciated.

I'm trying to write a Javascript function to find indices of all occurrences of a word in a text document. Currently this is what I have--

//function that finds all occurrences of string 'needle' in string 'haystack'
function getMatches(haystack, needle) {
  if(needle && haystack){
    var matches=[], ind=0, l=needle.length;
    var t = haystack.toLowerCase();
    var n = needle.toLowerCase();
    while (true) {
      ind = t.indexOf(n, ind);
      if (ind == -1) break;
      matches.push(ind);
      ind += l;
  }
  return matches;
}

However, this gives me a problem since this matches the occurrences of the word even when it's part of a string. For example, if the needle is "book" and haystack is "Tom wrote a book. The book's name is Facebook for dummies", the result is the index of 'book', 'book's' and 'Facebook', when I want only the index of 'book'. How can I acplish this? Any help is appreciated.

Share Improve this question asked Sep 7, 2013 at 20:55 radhikaradhika 5342 gold badges7 silver badges15 bronze badges 1
  • I'd like to point out that regex has an "i" flag which causes the regular expression to match your string in a case-insensitive manner, so that there's no need for the .toLowerCase() calls above. I also saw it in some of the answers below – Nasser Al-Shawwa Commented Sep 7, 2013 at 22:16
Add a ment  | 

4 Answers 4

Reset to default 3

Here's the regex I propose:

/\bbook\b((?!\W(?=\w))|(?=\s))/gi

To fix your problem. Try it with the exec() method. The regexp I provided will also consider words like "booklet" that occur in the example sentence you provided:

function getMatches(needle, haystack) {
    var myRe = new RegExp("\\b" + needle + "\\b((?!\\W(?=\\w))|(?=\\s))", "gi"),
        myArray, myResult = [];
    while ((myArray = myRe.exec(haystack)) !== null) {
        myResult.push(myArray.index);
    }
    return myResult;
}

Edit

I've edited the regexp to account for words like "booklet" as well. I've also reformatted my answer to be similar to your function.

You can do some testing here

Try this:

function getMatches(searchStr, str) {
    var ind = 0, searchStrL = searchStr.length;
    var index, matches = [];

    str = str.toLowerCase();
    searchStr = searchStr.toLowerCase();

    while ((index = str.indexOf(searchStr, ind)) > -1) {
         matches.push(index);
         ind = index + searchStrL;
    }
    return matches;
}

indexOf returns the position of the first occurrence of book.

var str = "Tom wrote a book. The book's name is Facebook for dummies";
var n = str.indexOf("book");

I don't know what is going on there but I can offer a better solution using a regex.

function getMatches(haystack, needle) {
    var regex = new RegExp(needle.toLowerCase(), 'g'),
        result = [];

    haystack = haystack.toLowerCase();

    while ((match = regex.exec(haystack)) != null) {
        result.push(match.index);
    }
    return result;
}

Usage:

getMatches('hello hi hello hi hi hi hello hi hello john hi hi', 'hi');

Result => [6, 15, 18, 21, 30, 44, 47]

Conserning your book vs books problem, you just need to provide "book " with a space.

Or in the function you could do.

needle = ' ' + needle + ' ';

The easiest way might be using text.match(RegX) function. For example you can write something like this for a case insensitive search:

"This is a test. This is a Test.".match(/test/gi)

Result:

(2) ['test', 'Test']

Or this one for case sensitive scenarios:

"This is a test. This is a Test.".match(/test/g)

Result:

['test']

let myControlValue=document.getElementById('myControl').innerText;
document.getElementById('searchResult').innerText=myControlValue.match(/test/gi)
<p id='myControl'>This is a test. Just a Test
  </p>
  <span><b>Search Result:</b></span>
  <div id='searchResult'></div>

发布评论

评论列表(0)

  1. 暂无评论