最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How do I handle contractions with regex word boundaries in javascript - Stack Overflow

programmeradmin1浏览0评论

I have a nodejs script that reads in a file and counts word frequencies. I currently feed each line into a function:

function getWords(line) {
    return line.match(/\b\w+\b/g);
}

This matches almost everything, except it misses contractions

getWords("I'm") -> {"I", "m"}

However, I cannot just include apostrophes, as I would want matched apostrophes to be word boundaries:

getWords("hey'there'") -> {"hey", "there"}

Is there a way capture contractions while still treating other apostrophes as word boundaries?

I have a nodejs script that reads in a file and counts word frequencies. I currently feed each line into a function:

function getWords(line) {
    return line.match(/\b\w+\b/g);
}

This matches almost everything, except it misses contractions

getWords("I'm") -> {"I", "m"}

However, I cannot just include apostrophes, as I would want matched apostrophes to be word boundaries:

getWords("hey'there'") -> {"hey", "there"}

Is there a way capture contractions while still treating other apostrophes as word boundaries?

Share Improve this question asked Dec 31, 2014 at 2:59 EhrykEhryk 1,9883 gold badges27 silver badges47 bronze badges 8
  • How can you tell that I'm should be split but hey'there' should not? Sounds like this might require a dictionary? – Aaron Dufour Commented Dec 31, 2014 at 3:31
  • will "hey'there'" really appear like that, or will it have a space like "hey 'there'"? – Wesley Smith Commented Dec 31, 2014 at 3:38
  • 3 What if the input is "I'm Steve O'Conner's 'friend'"? How would you know that O'Conner's is actually one word, not three? Or what if the matched apostrophes you mention contain a contraction with another apostrophe? – nnnnnn Commented Dec 31, 2014 at 3:39
  • @nnnnnn my answer below seems to cover that case but it could use more testing – Wesley Smith Commented Dec 31, 2014 at 3:46
  • 1 My question is, for the record, neither a joke nor rhetorical. You're going to have a hard time getting the answer you want unless you provide actual criteria for making the determination. @DelightedD0D's answer is good, but it drops the apostrophe from words like "'twas" and "'ow", which are also contractions, and it's not clear whether that's important to you. – Aaron Dufour Commented Dec 31, 2014 at 4:15
 |  Show 3 more ments

2 Answers 2

Reset to default 5

The closest I believe you could get with regex would be line.match(/(?!'.*')\b[\w']+\b/g) but be aware that if there is no space between a word and a ', it will get treated as a contraction.

As Aaron Dufour mentioned, there would be no way for the regex by itself to know that I'm is a contraction but hey'there isn't.

See below:

You can match letters and a possible apostrophe followed by letters.

line.match(/[A-Za-z]+('[A-Za-z]+)?/g
发布评论

评论列表(0)

  1. 暂无评论
ok 不同模板 switch ($forum['model']) { /*case '0': include _include(APP_PATH . 'view/htm/read.htm'); break;*/ default: include _include(theme_load('read', $fid)); break; } } break; case '10': // 主题外链 / thread external link http_location(htmlspecialchars_decode(trim($thread['description']))); break; case '11': // 单页 / single page $attachlist = array(); $imagelist = array(); $thread['filelist'] = array(); $threadlist = NULL; $thread['files'] > 0 and list($attachlist, $imagelist, $thread['filelist']) = well_attach_find_by_tid($tid); $data = data_read_cache($tid); empty($data) and message(-1, lang('data_malformation')); $tidlist = $forum['threads'] ? page_find_by_fid($fid, $page, $pagesize) : NULL; if ($tidlist) { $tidarr = arrlist_values($tidlist, 'tid'); $threadlist = well_thread_find($tidarr, $pagesize); // 按之前tidlist排序 $threadlist = array2_sort_key($threadlist, $tidlist, 'tid'); } $allowpost = forum_access_user($fid, $gid, 'allowpost'); $allowupdate = forum_access_mod($fid, $gid, 'allowupdate'); $allowdelete = forum_access_mod($fid, $gid, 'allowdelete'); $access = array('allowpost' => $allowpost, 'allowupdate' => $allowupdate, 'allowdelete' => $allowdelete); $header['title'] = $thread['subject']; $header['mobile_link'] = $thread['url']; $header['keywords'] = $thread['keyword'] ? $thread['keyword'] : $thread['subject']; $header['description'] = $thread['description'] ? $thread['description'] : $thread['brief']; $_SESSION['fid'] = $fid; if ($ajax) { empty($conf['api_on']) and message(0, lang('closed')); $apilist['header'] = $header; $apilist['extra'] = $extra; $apilist['access'] = $access; $apilist['thread'] = well_thread_safe_info($thread); $apilist['thread_data'] = $data; $apilist['forum'] = $forum; $apilist['imagelist'] = $imagelist; $apilist['filelist'] = $thread['filelist']; $apilist['threadlist'] = $threadlist; message(0, $apilist); } else { include _include(theme_load('single_page', $fid)); } break; default: message(-1, lang('data_malformation')); break; } ?>