Here is a piece of code to pare two sentences word by word and return the number of word matches with some conditions:
hint: the word in the first sentence :::: the word in the second sentence
1) protecting :::: i should result Not matched
2) protecting :::: protect should result matched
3) protect :::: protecting should result matched
4) him :::: i should result Not matched
5) i :::: i should result matched but only once not twice: (let me explain this)
We have this string as the first sentence:
let speechResult = "they're were protecting him i knew that i was aware";
It has two i as you see but there is only one i in the second sentence here:
let expectSt = ['i was sent to earth to protect you'];
So we should consider this match as one occurrence not two, If we had two i occurrences in the second sentence too, then we would consider the i matches as two occurrences.
6) was :::: was should result matched
Here is my code so far:
// Sentences we should pare word by word
let speechResult = "they're were protecting him i knew that i was aware";
let expectSt = ['i was sent to earth to protect you'];
// Create arrays of words from above sentences
let speechResultWords = speechResult.split(/\s+/);
let expectStWords = expectSt[0].split(/\s+/);
// Here you are..
//console.log(speechResultWords)
//console.log(expectStWords)
// Count Matches between two sentences
function includeWords(){
// Declare a variable to hold the count number of matches
let countMatches = 0;
for(let a = 0; a < speechResultWords.length; a++){
for(let b = 0; b < expectStWords.length; b++){
if(speechResultWords[a].includes(expectStWords[b])){
console.log(speechResultWords[a] + ' includes in ' + expectStWords[b]);
countMatches++
}
} // End of first for loop
} // End of second for loop
return countMatches;
};
// Finally initiate the function to count the matches
let matches = includeWords();
console.log('Matched words: ' + matches);
Here is a piece of code to pare two sentences word by word and return the number of word matches with some conditions:
hint: the word in the first sentence :::: the word in the second sentence
1) protecting :::: i should result Not matched
2) protecting :::: protect should result matched
3) protect :::: protecting should result matched
4) him :::: i should result Not matched
5) i :::: i should result matched but only once not twice: (let me explain this)
We have this string as the first sentence:
let speechResult = "they're were protecting him i knew that i was aware";
It has two i as you see but there is only one i in the second sentence here:
let expectSt = ['i was sent to earth to protect you'];
So we should consider this match as one occurrence not two, If we had two i occurrences in the second sentence too, then we would consider the i matches as two occurrences.
6) was :::: was should result matched
Here is my code so far:
// Sentences we should pare word by word
let speechResult = "they're were protecting him i knew that i was aware";
let expectSt = ['i was sent to earth to protect you'];
// Create arrays of words from above sentences
let speechResultWords = speechResult.split(/\s+/);
let expectStWords = expectSt[0].split(/\s+/);
// Here you are..
//console.log(speechResultWords)
//console.log(expectStWords)
// Count Matches between two sentences
function includeWords(){
// Declare a variable to hold the count number of matches
let countMatches = 0;
for(let a = 0; a < speechResultWords.length; a++){
for(let b = 0; b < expectStWords.length; b++){
if(speechResultWords[a].includes(expectStWords[b])){
console.log(speechResultWords[a] + ' includes in ' + expectStWords[b]);
countMatches++
}
} // End of first for loop
} // End of second for loop
return countMatches;
};
// Finally initiate the function to count the matches
let matches = includeWords();
console.log('Matched words: ' + matches);
Share
Improve this question
edited Jan 20, 2020 at 16:35
Sara Ree
asked Jan 20, 2020 at 16:02
Sara ReeSara Ree
3,5431 gold badge19 silver badges72 bronze badges
5
- 1 What is your problem? Please be more specific. – Kamil Naja Commented Jan 20, 2020 at 16:09
-
3
I mentioned the problems in the conditions, please run the code snippet. The code I wrote assumes the
protecting
andi
matched and ... – Sara Ree Commented Jan 20, 2020 at 16:11 -
2
why
protect
andprotecting
match buti
andhim
do not? Is it based on stemming?, is it more simplistically based on mon starting sequence (e.g:super
andshow
should match because of the s)? – grodzi Commented Jan 20, 2020 at 16:35 -
1
I mean matching between words, as
i
is a single word andhim
is another word they should not be considered matched .super
andshow
are different words so there is no match.. – Sara Ree Commented Jan 20, 2020 at 16:39 - 3 my question was almost rhetorical: why would protect and protecting be the same word? (you should emphase that on your original post since no answer so far has taken that constraint into account) – grodzi Commented Jan 20, 2020 at 16:41
6 Answers
Reset to default 4You could count the wanted words with a Map
and iterate the given words by checking the word count.
function includeWords(wanted, seen) {
var wantedMap = wanted.split(/\s+/).reduce((m, s) => m.set(s, (m.get(s) || 0) + 1), new Map),
wantedArray = Array.from(wantedMap.keys()),
count = 0;
seen.split(/\s+/)
.forEach(s => {
var key = wantedArray.find(t => s === t || s.length > 3 && t.length > 3 && (s.startsWith(t) || t.startsWith(s)));
if (!wantedMap.get(key)) return;
console.log(s, key)
++count;
wantedMap.set(key, wantedMap.get(key) - 1);
});
return count;
}
let matches = includeWords('i was sent to earth to protect you', 'they\'re were protecting him i knew that i was aware');
console.log('Matched words: ' + matches);
.as-console-wrapper { max-height: 100% !important; top: 0; }
I think this should work:
let speechResult = "they're were protecting him i knew that i was aware";
let expectSt = ['i was sent to earth to protect you'];
function includeWords(){
let countMatches = 0;
let ArrayFromStr = speechResult.split(" ");
let Uniq = new Set(ArrayFromStr)
let NewArray = [Uniq]
let str2 = expectSt[0]
for (word in NewArray){
if (str2.includes(word)){
countMatches += 1
}
}
return countMatches;
};
let matches = includeWords();
I get the speechResult, made it into a array, remove duplicates, convert to array again, and then check if the expectSt string contains every word on the NewArray array.
Iterate over the strings and update the index of the matched word with the empty string and store the matches in an array.
let speechResult = "they're were protecting him i knew that i was aware";
let expectSt = ['i was sent to earth to protect you'];
// Create arrays of words from above sentences
let speechResultWords = speechResult.split(/\s+/);
let expectStWords = expectSt[0].split(/\s+/);
const matches = [];
speechResultWords.forEach(str => {
for(let i=0; i<expectStWords.length; i++) {
const innerStr = expectStWords[i];
if(innerStr && (str.startsWith(innerStr) || innerStr.startsWith(str)) && (str.includes(innerStr) || innerStr.includes(str))) {
if(str.length >= innerStr.length) {
matches.push(innerStr);
expectStWords[i] = '';
} else {
matches.push(str);
}
break;
}
}
});
console.log(matches.length);
By using stemming you intuit that words having the same stem are the same.
e.g
- for verb: protect, protected, protecting, ...
- but also plural: ball, balls
What you may want to do is:
- stem the words: use some stemmer (which will have their pros & cons) (e.g PorterStemmer which seem to have a js implem)
- count the occurrence on that "stemmed space", which is trivial
NB: splitting with '\s' may not be enough, think about mas and more generally punctuation. Should you have more need, keyword for this is tokenization.
Below an example using PorterStemmer with some poor home made tokenization
const examples = [
['protecting','i'],
['protecting','protect'],
['protect','protecting'],
['him','i'],
['i','i'],
['they\'re were protecting him i knew that i was aware','i was sent to earth to protect you'],
['i i', 'i i i i i']
]
function tokenize(s) {
// this is not good, get yourself a good tokenizer
return s.split(/\s+/).filter(x=>x.replace(/[^a-zA-Z0-9']/g,''))
}
function countWords(a, b){
const sa = tokenize(a).map(t => stemmer(t))
const sb = tokenize(b).map(t => stemmer(t))
const m = sa.reduce((m, w) => (m[w] = (m[w] || 0) + 1, m), {})
return sb.reduce((count, w) => {
if (m[w]) {
m[w]--
return count + 1
}
return count
}, 0)
}
examples.forEach(([a,b], i) => console.log(`ex ${i+1}: ${countWords(a,b)}`))
<script src="https://cdn.jsdelivr/gh/kristopolous/Porter-Stemmer/PorterStemmer1980.js"></script>
I think it will provide the primitive solution by paring sentences' tokens. But here are two pitfalls that I can see:
- You should pare both sentences' tokens in your main
IF
clause by anOR
operand - You can add both occurrences in a
SET
collection to avoid any repetitions.
You can use the below function to get the count of all matched word between two sentence / set of strings.
function matchWords(str1, str2){
let countMatches = 0;
let strArray = str1.split(" ");
let uniqueArray = [...new Set(strArray)];
uniqueArray.forEach( word => {
if (str2.includes(word)){
countMatches += 1
}
})
return countMatches;
};
console.log("Count:", matchWords("Test Match Words".toLowerCase(),"Result Match Words".toLowerCase());
Above code is tested and working.