Looks like I'm missing something obvious when trying to fuzzy match multi term query.
What I'd like to achieve is to get only "Goleniow Helenow" result when providing "Goleniow Heleniow" query (city + district name with typo). Instead I get all the docs.
Basically I think I've tried all combinations of minimum_should_match, operator and even fuzziness parameters with no satisfying result.
Anyone could point out what am I missing ?
Index setup
curl -X PUT "localhost:9200/test-index?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"name": {
"type": "text",
"index": true
}
}
}
}'
Docs to index
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Goleniow Helenow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Goleniow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Goleniow Jaworow"
}'
Query and result
curl -X POST "localhost:9200/test-index/_search?pretty" -H 'Content-Type: application/json' -d'{
"query": {
"match": {
"name": {
"minimum_should_match": "100%",
"operator": "and",
"fuzziness": "2",
"query": "Goleniow Heleniow"
}
}
}
}'
{
"took" : 104,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.32180583,
"hits" : [
{
"_index" : "test-index",
"_id" : "Waqnj5UBnvH7uZURvQTX",
"_score" : 0.32180583,
"_source" : {
"name" : "Goleniow Helenow"
}
},
{
"_index" : "test-index",
"_id" : "Wqqoj5UBnvH7uZUR2QSO",
"_score" : 0.2793999,
"_source" : {
"name" : "Goleniow"
}
},
{
"_index" : "test-index",
"_id" : "W6qoj5UBnvH7uZUR8AT4",
"_score" : 0.21600665,
"_source" : {
"name" : "Goleniow Jaworow"
}
}
]
}
}
Looks like I'm missing something obvious when trying to fuzzy match multi term query.
What I'd like to achieve is to get only "Goleniow Helenow" result when providing "Goleniow Heleniow" query (city + district name with typo). Instead I get all the docs.
Basically I think I've tried all combinations of minimum_should_match, operator and even fuzziness parameters with no satisfying result.
Anyone could point out what am I missing ?
Index setup
curl -X PUT "localhost:9200/test-index?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"properties": {
"name": {
"type": "text",
"index": true
}
}
}
}'
Docs to index
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Goleniow Helenow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Goleniow"
}'
curl -X POST "localhost:9200/test-index/_doc?pretty" -H 'Content-Type: application/json' -d'
{
"name": "Goleniow Jaworow"
}'
Query and result
curl -X POST "localhost:9200/test-index/_search?pretty" -H 'Content-Type: application/json' -d'{
"query": {
"match": {
"name": {
"minimum_should_match": "100%",
"operator": "and",
"fuzziness": "2",
"query": "Goleniow Heleniow"
}
}
}
}'
{
"took" : 104,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.32180583,
"hits" : [
{
"_index" : "test-index",
"_id" : "Waqnj5UBnvH7uZURvQTX",
"_score" : 0.32180583,
"_source" : {
"name" : "Goleniow Helenow"
}
},
{
"_index" : "test-index",
"_id" : "Wqqoj5UBnvH7uZUR2QSO",
"_score" : 0.2793999,
"_source" : {
"name" : "Goleniow"
}
},
{
"_index" : "test-index",
"_id" : "W6qoj5UBnvH7uZUR8AT4",
"_score" : 0.21600665,
"_source" : {
"name" : "Goleniow Jaworow"
}
}
]
}
}
Share
Improve this question
asked Mar 14 at 8:38
John TamedJohn Tamed
231 silver badge2 bronze badges
1 Answer
Reset to default 0You can use span_near
query. Here is a similar discussion.
Example query:
GET test-index/_search
{
"query": {
"bool": {
"must": [
{
"span_near": {
"clauses": [
{
"span_multi": {
"match": {
"fuzzy": {
"name": {
"value": "Goleniow",
"fuzziness": 2
}
}
}
}
},
{
"span_multi": {
"match": {
"fuzzy": {
"name": {
"value": "Heleniow",
"fuzziness": 2
}
}
}
}
}
],
"slop": 0,
"in_order": true
}
}
]
}
}
}