最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

elasticsearch - How to Aggregate Average of Related Items in Two Arrays - Stack Overflow

programmeradmin0浏览0评论

I'm wondering how to aggregate (and build a bar-graph dashboard on top of it) average of list elements across documents/records that I have in ElasticSearch. Let me try to explain with a simple version:

Say I have three documents in ES, with each document having two array fields ('runners' - an array of strings, and 'runners_times' - an array of numbers, where elements in runners and runners_times are sorted so that the first element from the first list corresponds to the first element in the second list, so from document 1: person_a = 100, person_b = 120). Say my three documents/records in ES look like this:

  1. runners: [person_a, person_b], runners_times: [100, 120]
  2. runners: [person_a, person_c], runners_times: [90, 110]
  3. runners: [person_b, person_c], runners_times: [100, 130]

Now, what I want is a bar-graph that gives a list of all unique runners across all three documents (so, in this case, 'person_a', 'person_b', and 'person_c') with their corresponding average times. So, in my case, that would be:

person_a: 95 person_b: 110 person_c: 120

Any tip would be great. Thanks a lot :-)

I'm able to get a list of all unique value in runners, but I'm not sure how to get an average of that person's times, since they are in a separate list.

Should I perhaps try with dictionaries? {'person_a': 100, 'person_b': 120} maybe? I tried that, too, but dictionaries get saved as a list of unfolded fields instead.

I'm wondering how to aggregate (and build a bar-graph dashboard on top of it) average of list elements across documents/records that I have in ElasticSearch. Let me try to explain with a simple version:

Say I have three documents in ES, with each document having two array fields ('runners' - an array of strings, and 'runners_times' - an array of numbers, where elements in runners and runners_times are sorted so that the first element from the first list corresponds to the first element in the second list, so from document 1: person_a = 100, person_b = 120). Say my three documents/records in ES look like this:

  1. runners: [person_a, person_b], runners_times: [100, 120]
  2. runners: [person_a, person_c], runners_times: [90, 110]
  3. runners: [person_b, person_c], runners_times: [100, 130]

Now, what I want is a bar-graph that gives a list of all unique runners across all three documents (so, in this case, 'person_a', 'person_b', and 'person_c') with their corresponding average times. So, in my case, that would be:

person_a: 95 person_b: 110 person_c: 120

Any tip would be great. Thanks a lot :-)

I'm able to get a list of all unique value in runners, but I'm not sure how to get an average of that person's times, since they are in a separate list.

Should I perhaps try with dictionaries? {'person_a': 100, 'person_b': 120} maybe? I tried that, too, but dictionaries get saved as a list of unfolded fields instead.

Share Improve this question edited Apr 6 at 2:22 G0l0s 5402 silver badges11 bronze badges asked Feb 18 at 11:43 IgorStanIgorStan 31 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 0

You should re-anize your data. Runner and its time must be a nested field with the following mapping

PUT /runners_reindexed
{
    "mappings": {
        "properties": {
            "runner_data": {
                "type": "nested",
                "properties": {
                    "runner": {
                        "type": "keyword"
                    },
                    "time": {
                        "type": "integer"
                    }
                }
            }
        }
    }
}

Put your documents

POST /runners/_bulk
{"create":{}}
{"runners": ["person_a", "person_b"], "runners_times": [100, 120]}
{"create":{}}
{"runners": ["person_a", "person_c"], "runners_times": [90, 110]}
{"create":{}}
{"runners": ["person_b", "person_c"], "runners_times": [100, 130]}

Then reindex the source index into a new index with name runners_reindexed

POST _reindex
{
    "source": {
        "index": "runners"
    },
    "dest": {
        "index": "runners_reindexed"
    },
    "script": {
        "source": """
                List runners = ctx['_source']['runners'];
                List runnerTimes = ctx['_source']['runners_times'];
                
                List runnersWithTimes = new LinkedList();
                for (int i = 0; i < runners.size(); i++) {
                    Map runnerData = new HashMap();
                    runnerData['runner'] = runners[i];
                    runnerData['time'] = runnerTimes[i];
                    runnersWithTimes.add(runnerData);
                }
                ctx._source[params['runner_with_time_field_name']] =     runnersWithTimes;
        """,
        "params": {
            "runner_with_time_field_name": "runner_data"
        }
    }
}

It's time to aggregate

GET /runners_reindexed/_search?filter_path=aggregations.inside_runner_data.by_runner.buckets
{
    "aggs": {
        "inside_runner_data": {
            "nested": {
                "path": "runner_data"
            },
            "aggs": {
                "by_runner": {
                    "terms": {
                        "field": "runner_data.runner",
                        "size": 10
                    },
                    "aggs": {
                        "mean": {
                            "avg": {
                                "field": "runner_data.time"
                            }
                        }
                    }
                }
            }
        }
    }
}

Response

{
    "aggregations" : {
        "inside_runner_data" : {
            "by_runner" : {
                "buckets" : [
                    {
                        "key" : "person_a",
                        "doc_count" : 2,
                        "mean" : {
                            "value" : 95.0
                        }
                    },
                    {
                        "key" : "person_b",
                        "doc_count" : 2,
                        "mean" : {
                            "value" : 110.0
                        }
                    },
                    {
                        "key" : "person_c",
                        "doc_count" : 2,
                        "mean" : {
                            "value" : 120.0
                        }
                    }
                ]
            }
        }
    }
}
发布评论

评论列表(0)

  1. 暂无评论