最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Regex to match words in a sentence by its prefix - Stack Overflow

programmeradmin4浏览0评论

I have this regex on mongodb query to match words by prefix:

{sentence: new RegExp('^'+key,'gi')}

What would be the right regex pattern if I want it to match a sentence that has at least a word starting with key prefix? For example:

If I have a sentence

"This is a dog"

when key is 'do', then it should match that sentence since prefix 'do' is a substring of 'dog'.

My solution as of now only works for the first word of the sentence. It so far only matches that sentence if I type in 't' or 'th' or 'this'. It wouldnt match that sentence whenever I type in 'i' (prefix for 'is') or 'do' (prefix for 'dog').

I have this regex on mongodb query to match words by prefix:

{sentence: new RegExp('^'+key,'gi')}

What would be the right regex pattern if I want it to match a sentence that has at least a word starting with key prefix? For example:

If I have a sentence

"This is a dog"

when key is 'do', then it should match that sentence since prefix 'do' is a substring of 'dog'.

My solution as of now only works for the first word of the sentence. It so far only matches that sentence if I type in 't' or 'th' or 'this'. It wouldnt match that sentence whenever I type in 'i' (prefix for 'is') or 'do' (prefix for 'dog').

Share Improve this question edited Jan 29, 2012 at 9:22 gdoron 150k59 gold badges302 silver badges371 bronze badges asked Jan 29, 2012 at 9:20 Benny TjiaBenny Tjia 4,88310 gold badges41 silver badges49 bronze badges
Add a comment  | 

4 Answers 4

Reset to default 8

You can use the expression /\bprefix\w+/. This should match any word starting with "prefix". Here the \b represents a word boundary and \w is any word character.

If you don't want to get the whole word, you can just do /\bprefix/. If you want to put this in a string, you also have to escape the \: '\\bprefix'.

Use the \b anchor to match word boundaries:

\bdo

finds 'do' in 'nice dog', but doesn't match 'much ado about nothing'.

The other answers suggesting the word boundary matching are neat, but will mean that an index isn't used efficiently. If you need fast lookups, you might want to consider adding a field "words" with each of your words broken up, i.e.

{sentence: "This is a dog",
  words: ["This", "is", "a", "dog"]}

After putting an index on the words field, you can go back to using:

{words: new RegExp('^'+key,'gi')}

and a key of "do" will now match this object and use an index.

^ matches beginning of the string (or beginning of a line if the multiline flag is set).

\b matches a word boundary.

\bdo matches words beginning with "do".

So for your example:

{sentence: new RegExp('\\b'+key,'gi')}

(Noting that in a JavaScript string you have to escape backslashes.)

If you will be needing to capture the match(es) to find out what word(s) matched the pattern you'll want to wrap the expression in parentheses and add a bit to match the rest of the word:

new RegExp('(\\b' + key + '\\w*)','gi')

Where \w is any word character and the * is zero or more. If you want words that have at least one character more than the key then use + instead of *.

See the many regex guides on the web for more details, e.g., https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions

发布评论

评论列表(0)

  1. 暂无评论