最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

apache spark - Elasticsearch v7 - How to debug different behaviour between two indexes - Stack Overflow

programmeradmin1浏览0评论

I am on Elasticsearch 7.17.22. I have 2 indexes, the first "my_legacy_index", the second "my_new_index".

I have a spark scala job that inserts the same dataframe into both indexes.

Here's the libs used:

".elasticsearch" % "elasticsearch-spark-30_2.12" % "7.16.3",
".elasticsearch.client" % "elasticsearch-rest-client" % "7.16.3",
".elasticsearch.client" % "elasticsearch-rest-high-level-client" % "7.16.3",

When performing a search using the "routing" parameter, I have a different behavior. On the first one, the "routing" parameter is taken into account. On the second one, the "routing" parameter seems to be ignored.

Here is an example:

# 0 results
# Does not return any results (this is the correct behavior).
# The correct routing is 50_15
GET my_legacy_index/_search?routing=50_0
{
"query": {
"ids": {
"values": ["50-15-15-20250123152311-xxx"]
}
}
}

# 1 result
# Returns results when it shouldn't.
# The correct routing is 50_15
GET my_new_index/_search?routing=50_0
{
"query": {
"ids": {
"values": ["50-15-15-20250123152311-xxx"]
}
}
}

Knowing that:

  • The 2 indexes are created in the same way.
  • In both indexes I have exactly the same data.
  • It is the same job that inserts the documents into both indexes.
  • The same template is used for both indexes "my_*".
  • I have the "_routing": { "required": true} in both indexes.
  • The same ingestion pipeline is used when inserting into both indexes.

Question: Do you have any idea how to debug this problem? I tried to compare the 2 indexes and they seem exactly the same.

发布评论

评论列表(0)

  1. 暂无评论