Is there any fast way in JavaScript to find out if 2 Strings contain the same substring? e.g. I have these 2 Strings: "audi is a car" and "audiA8".
As you see the word "audi" is in both strings but we cannot find it out with a simple indexOf
or RegExp, because of other characters in both strings.
Is there any fast way in JavaScript to find out if 2 Strings contain the same substring? e.g. I have these 2 Strings: "audi is a car" and "audiA8".
As you see the word "audi" is in both strings but we cannot find it out with a simple indexOf
or RegExp, because of other characters in both strings.
5 Answers
Reset to default 10The standard tool for doing this sort of thing in Bioinformatics is the BLAST program. It is used to compare two fragments of molecules (like DNA or proteins) to find where they align with each other - basically where the two strings (sometimes multi GB in size) share common substrings.
The basic algorithm is simple, just systematically break up one of the strings into pieces and compare the pieces with the other string. A simple implementation would be something like:
// Note: not fully tested, there may be bugs:
function subCompare (needle, haystack, min_substring_length) {
// Min substring length is optional, if not given or is 0 default to 1:
min_substring_length = min_substring_length || 1;
// Search possible substrings from largest to smallest:
for (var i=needle.length; i>=min_substring_length; i--) {
for (j=0; j <= (needle.length - i); j++) {
var substring = needle.substr(j,i);
var k = haystack.indexOf(substring);
if (k != -1) {
return {
found : 1,
substring : substring,
needleIndex : j,
haystackIndex : k
}
}
}
}
return {
found : 0
}
}
You can modify this algorithm to do more fancy searches like ignoring case, fuzzy matching the substring, look for multiple substrings etc. This is just the basic idea.
Take a look at the similar text function implementation here. It returns the number of matching chars in both strings.
For your example it would be:
similar_text("audi is a car", "audiA8") // -> 4
which means that strings have 4-char common substring.
Don't know about any simpler method, but this should work:
if(a.indexOf(substring) != -1 && b.indexOf(substring) != -1) { ... }
where a
and b
are your strings.
You can use the powerful algorythm of this library: https://github.com/kpdecker/jsdiff/blob/master/src/diff/base.js
like this
const wordDiff = new Diff();
wordDiff.diff('audi is a car', 'audiA8', {});
and receive the result
[
{
"count": 4,
"added": false,
"removed": false,
"value": "audi"
},
{
"count": 9,
"added": false,
"removed": true,
"value": " is a car"
},
{
"count": 2,
"added": true,
"removed": false,
"value": "A8"
}
]
Where "added": false, "removed": false - this values are common substrings.
You can do much more with this amazing library.
var a = "audi is a car";
var b = "audiA8";
var chunks = a.split(" ");
var commonsFound = 0;
for (var i = 0; i < chunks.length; i++) {
if(b.indexOf(chunks[i]) != -1) commonsFound++;
}
alert(commonsFound + " common substrings found.");
if (string1 === string2) { /*identical*/ }
- What are you really trying to ask, how to test whether a particular substring is in two different strings, or whether there exists some substring that appears in two different strings, or...? Could you please show an example input and the desired output? – nnnnnn Commented Oct 22, 2012 at 7:14abc
andcde
, should they be considered "identical" because ofc
? – duri Commented Oct 22, 2012 at 7:16