最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - elasticsearch analyzer - lowercase and whitespace tokenizer - Stack Overflow

programmeradmin3浏览0评论

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?

This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...

{
  "mappings": {
    "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
      }
    }
  }
}

Please help...

How can I create a mapping that will tokenize the string on whitespace and also change it to lowercase for indexing?

This is my current mapping that tokenizes by whitespace by I cant understand how to lowercase it and also search (query) the same...

{
  "mappings": {
    "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "whitespace", "tokenizer": "whitespace", "search_analyzer":"whitespace" }
      }
    }
  }
}

Please help...

Share Improve this question asked Dec 13, 2014 at 2:36 user3658423user3658423 1,9445 gold badges30 silver badges51 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 17

i managed to write a custom analyzer and this works...

"settings":{
  "analysis": {
    "analyzer": {
      "lowercasespaceanalyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase"
        ]
      }
    }
  }
},


"mappings": {
 "my_type" : {
  "properties" : {
    "title" : { "type" : "string", "analyzer" : "lowercasespaceanalyzer", "tokenizer": "whitespace", "search_analyzer":"whitespace", "filter": [
      "lowercase"
    ] }
  }
 }
}

You have two options -

Simple Analyser

the simple analyser will probably meet your needs:

curl -XGET 'localhost:9200/myindex/_analyze?analyzer=simple&pretty' -d 'Some DATA' 
{
  "tokens" : [ {
    "token" : "some",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "data",
    "start_offset" : 5,
    "end_offset" : 9,
    "type" : "word",
    "position" : 2
  } ]
}

To use the simple analyser in your mapping:

{
 "mappings": {
   "my_type" : {
      "properties" : {
        "title" : { "type" : "string", "analyzer" : "simple"}
      }
    }
  }
}

Custom Analyser

Second option is to define your own custom analyser and specify how to tokenise and filter the data. Then refer to this new analyser in your mapping.

发布评论

评论列表(0)

  1. 暂无评论