最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - RegEx: Understanding Syllable Counter Code - Stack Overflow

programmeradmin3浏览0评论

I have used Dylan's question on here regarding JavaScript syllable counting, and more specifically artfulhacker's answer, in my own code and, regardless of which single or multi word string I feed it, the function is always able to correctly count the number of syllables.

I have a limited experience with RegEx and not enough prior knowledge to decipher what exactly is happening in the following code without some help. I'm not someone who is ever happy with having some code I pulled from somewhere just work without me knowing how it works. Is someone able to please articulate what is happening in the new_count(word) function below and help me decipher the use of RegEx and how it is that the function is able to correctly count syllables? Many

function new_count(word) {
  word = word.toLowerCase();                                     //word.downcase!
  if(word.length <= 3) { return 1; }                             //return 1 if word.length <= 3
  word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');   //word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word = word.replace(/^y/, '');                                 //word.sub!(/^y/, '')
  return word.match(/[aeiouy]{1,2}/g).length;                    //word.scan(/[aeiouy]{1,2}/).size
}

I have used Dylan's question on here regarding JavaScript syllable counting, and more specifically artfulhacker's answer, in my own code and, regardless of which single or multi word string I feed it, the function is always able to correctly count the number of syllables.

I have a limited experience with RegEx and not enough prior knowledge to decipher what exactly is happening in the following code without some help. I'm not someone who is ever happy with having some code I pulled from somewhere just work without me knowing how it works. Is someone able to please articulate what is happening in the new_count(word) function below and help me decipher the use of RegEx and how it is that the function is able to correctly count syllables? Many

function new_count(word) {
  word = word.toLowerCase();                                     //word.downcase!
  if(word.length <= 3) { return 1; }                             //return 1 if word.length <= 3
  word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');   //word.sub!(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '')
  word = word.replace(/^y/, '');                                 //word.sub!(/^y/, '')
  return word.match(/[aeiouy]{1,2}/g).length;                    //word.scan(/[aeiouy]{1,2}/).size
}
Share Improve this question edited May 23, 2017 at 11:52 CommunityBot 11 silver badge asked Feb 7, 2015 at 16:46 J BloomJ Bloom 531 silver badge6 bronze badges 1
  • I'm here for the exact same reason lol. RegEx is super confusing to me right now. – Thomas Commented Mar 22, 2016 at 5:33
Add a ment  | 

2 Answers 2

Reset to default 4

As far as I see it, we basically want to count the vowels, or vowel pairs, with some special cases. Let's start by the last line, which does that, i.e. count vowels and pairs:

return word.match(/[aeiouy]{1,2}/g).length;

This will match any vowel, or vowel pair. [...] means a character class, i.e. that if we go through the string character-by-character, we have a match, if the actual character is one of those. {1, 2} is the number of repetitions, i.e. it means that we should match exactly one or two such characters.

The other two lines are for special cases.

word = word.replace(/(?:[^laeiouy]es|ed|[^laeiouy]e)$/, '');

This line will remove 'syllables' from the end of the word, which are either:

  • Xes (where X is anything but any of 'laeiouy', e.g. 'zes')
  • ed
  • Xe (where X is anything but any of 'laeiouy', e.g. 'xe')

(I'm not really sure what the grammatical meaning behind this is, but I guess, that 'syllables' at the end of the word, like '-ed', '-ded', '-xed' etc. don't really count as such.) As for the regexp part: (?:...) is a non-capturing group. I guess it's not really important in this case that this group be non-capturing; this just means that we would like to group the whole expression, but then we do not need to refer back to it. However, we could have used a capturing group too (i.e. (...) )

The [^...] is a negated character class. It means, match any character, which is none of those listed here. (Compare to the (non-negated) character-class mentioned above.) The pipe symbol, i.e. |, is the alternation operator, which means, that any of the expressions can match. Finally, the $ anchor matches the end of the line, or string (depending on the context).

word = word.replace(/^y/, '');

This line removes 'y'-s from the beginning of words (probably 'y' at the beginning does not count as a syllable -- which makes sense in my opinion). ^ is the anchor for matching the beginning of the line, or string (c.f. $ mentioned above).

Note: the algorithm only works if word really contains one single word.

/(?:[^laeiouy]es|ed|[^laeiouy]e)$/

That matches three possible substrings: a letter other than 'l' or a vowel followed by 'es' (like "res" or "tes"); 'ed'; or a non-vowel, non-'l' followed by just an 'e'. Those patterns must appear at the end of the word to match because of the $ at the end of the pattern. The grouping (?: ) is just a grouping; the leading ?: makes that distinction. The pattern could have been a little shorter:

/(?:[^laeiouy]es?|ed)$/

would do the same thing. In any case, if the pattern matches the characters involved are removed from the word.

Then,

/^y/

matches a 'y' at the beginning of a word. If a 'y' is found, it's removed.

Finally,

/[aeiouy]{1,2}/g

matches any one- or two-character stretch of vowels (including 'y'). The g suffix makes it a global match, so that the return value is an array consisting of all such spans of vowels. The length of that returned array is the number of syllables (according to this technique).

Note that the words "poem" and "lion" would be reported as one-syllable words, which may be correct for some English variants but not all.

Here is a pretty good reference for JavaScript regular expression operators.

发布评论

评论列表(0)

  1. 暂无评论