javascript - elasticsearch analyzer - lowercase and whitespace tokenizer

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?

This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...

{
  "mappings": {
    "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
      }
    }
  }
}

Please help...

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?

This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...

{
  "mappings": {
    "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
      }
    }
  }
}

Please help...

Share Improve this question asked Dec 13, 2014 at 2:36 user3658423 1,9445 gold badges30 silver badges51 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 17

i managed to write a custom analyzer and this works...

"settings":{
  "analysis": {
    "analyzer": {
      "lowercasespaceanalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase"
        ]
      }
    }
  }
},


"mappings": {
 "my_type" : {
  "properties" : {
    "title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [
      "lowercase"
    ] }
  }
 }
}

You have two options -

Simple Analyser

the simple analyser will probably meet your needs:

curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA' 
{
  "tokens" : [ {
    "token" : "some",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  } ]
}

To use the simple analyser in your mapping:

{
 "mappings": {
   "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "simple"}
      }
    }
  }
}

Custom Analyser

Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - elasticsearch analyzer - lowercase and whitespace tokenizer - Stack Overflow

2 Answers 2

Simple Analyser

Custom Analyser

与本文相关的文章

评论列表(0)