How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?
This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...
{
"mappings": {
"my_type" : {
"properties" : {
"title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
}
}
}
}
Please help...
How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?
This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...
{
"mappings": {
"my_type" : {
"properties" : {
"title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
}
}
}
}
Please help...
Share Improve this question asked Dec 13, 2014 at 2:36 user3658423user3658423 1,9445 gold badges30 silver badges51 bronze badges2 Answers
Reset to default 17i managed to write a custom analyzer and this works...
"settings":{
"analysis": {
"analyzer": {
"lowercasespaceanalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"my_type" : {
"properties" : {
"title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [
"lowercase"
] }
}
}
}
You have two options -
Simple Analyser
the simple analyser will probably meet your needs:
curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA'
{
"tokens" : [ {
"token" : "some",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 1
}, {
"token" : "data",
"start_offset" : 5,
"end_offset" : 9,
"type" : "word",
"position" : 2
} ]
}
To use the simple analyser in your mapping:
{
"mappings": {
"my_type" : {
"properties" : {
"title" : { "type" : "string", "analyzer" : "simple"}
}
}
}
}
Custom Analyser
Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.