How to get all documents in mongodb with one levenshtein distance.
I have collection for football teams.
{
name: 'Real Madrir',
nicknames: ['Real', 'Madrid', 'Real Madrir' ... ]
}
And user searched Real Madid
of Maddrid
or something else.
I want to return all documents that contains nicknames with 0 or 1 levenshtein distance to given search string.
I think there is two ways, mongodb full text search or regex.
So can I write such regex or query ?
Thank you.
How to get all documents in mongodb with one levenshtein distance.
I have collection for football teams.
{
name: 'Real Madrir',
nicknames: ['Real', 'Madrid', 'Real Madrir' ... ]
}
And user searched Real Madid
of Maddrid
or something else.
I want to return all documents that contains nicknames with 0 or 1 levenshtein distance to given search string.
I think there is two ways, mongodb full text search or regex.
So can I write such regex or query ?
Thank you.
Share Improve this question asked Nov 26, 2016 at 11:55 GorGor 2,9086 gold badges28 silver badges48 bronze badges 1- 1 Read this for more clues: stackoverflow./questions/27977575/… – Mohammad Yusuf Commented Nov 26, 2016 at 12:25
2 Answers
Reset to default 9For full-text search, first you must create a Text Index on your nicknames
field. Documents inserted before an index has been created will not be searchable. The search only works for documents that have been inserted after the index has been created. Then when you perform a search using MongoDb's $text
and $search
operators, MongoDb will return the documents whose nicknames
field corresponds to the search text. For regex matching, MongoDb has a $regex
operator you can use.
Here are a couple of short examples:
Full Text Search
- Save this script as
football.js
. It will create ateams
collection with a Text Index and two documents for us to search.
// create football database
var db = connect("localhost:27017/football");
/*
note:
You may also create indexes from your console
using the MongoDb shell. Actually each of these
statements may be run from the shell. I'm using
a script file for convenience.
*/
// create Text Index on the 'nicknames' field
// so full-text search works
db.teams.createIndex({"nicknames":"text"});
// insert two teams to search for
db.teams.insert({
name: 'Real Madrir',
nicknames: ['Real', 'Madrid', 'Real Madrir' ]
})
db.teams.insert({
name: 'Fake Madrir',
nicknames: ['Fake']
})
Open your terminal and navigate to the directory where you saved
football.js
, then run this script against your local MongoDb instance by typingmongo football.js
.Type
mongo
from your terminal to open the MongoDb Shell and switch to thefootball
database by typinguse football
.Once you're in the football database, search for one of your documents using
db.teams.find({"$text":{"$search":"<search-text>"}})
> use football
// find Real Madrir
> db.teams.find({"$text":{"$search":"Real"}})
// find Fake Madrir
> db.teams.find({"$text":{"$search":"Fake"}})
Regex
If you want to search using a regex, you will not need to create an index. Just search using mongodb's $regex
operator:
//find Real Madrir
db.teams.find({"nicknames": {"$regex": /Real/}})
db.teams.find({"nicknames": {"$regex": /Real Madrir/}})
//find Fake Madrir
db.teams.find({"nicknames": {"$regex": /Fa/}})
db.teams.find({"nicknames": {"$regex": /ke/}})
Mongoose
This is how each of these searches would work in NodeJS using mongoose:
var searchText = "Madrir"; // or some value from request.body
var searchRegex = new RegExp(searchText);
var fullTextSearchOptions = {
"$text":{
"$search": searchText
}
};
var regexSearchOptions = {
"nicknames": {
"$regex": searchRegex
}
};
// full-text search
Team.find(fullTextSearchOptions, function(err, teams){
if(err){
// ...
}else if(teams){
// ...
}
})
// regex search
Team.find(regexSearchOptions, function(err, teams){
if(err){
// ...
}else if(teams){
// ...
}
})
This is ing late, but hopefully, will aid someone else searching for this.
The only options are not to use regex (does not use indexes so, will be very slow for large datasets) or use the normal $text
search (fast indexed searching but no partial matches). There is a third option that uses a bit more index memory, but both supports partial matches and uses indexes (so it's fast).
You create your own "index" by generating an array of strings from a string field (call that name
for example) and storing the resulting array in an indexed array field (let's call that _nameSearch
). Like so
const getSearchArray: (str) => string[] = _str => {
const str = _str.toLowerCase();
const output = [];
let acc = "";
let accTotal = "";
str.split("").forEach(char => {
// Reset accumulator when space is encountered
// Otherwise, add the new phrase to the array
accTotal += char;
output.push(accTotal);
if (char === " ") {
acc = "";
} else {
acc += char;
output.push(acc);
}
});
return Array.from(new Set(output));
};
So if the name
value is "option", _nameSearch
would be ["o", "op", "opt", "opti", "option"]
You can then, index _nameSearch
. So you schema would look like this:
const schema = new Schema(
{
name: String,
_nameSearch: { type: [String], index: true },
...
}
);
Querying the name
field would be as easy as db.collection.find({ _nameSearch: SEARCH_STRING })
. And you'd be able to find partial matches and also make use of indexes (so very fast searching). You will however make use of a bit larger index for the name
field, so it's a trade-off, but a viable option to consider.