最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Remove Duplicate Dictionaries from a list based on certain numeric value - Stack Overflow

programmeradmin6浏览0评论

I have a list of dictionary values which has duplicate name values. I would like to remove only the one whose value is less for the same name. For example, I have an input like below.

[{'name': 'x', 'value': 0.6479667110413355},
 {'name': 'x', 'value': 1.0},
 {'name': 'y', 'value': 0.9413355},
 {'name': 'y', 'value': 0.9}]

Hence, compare all the values of x and keep only that dictionary that has the highest values of x. Same goes for y. What I would like to have output is

[{'name': 'x', 'value': 1.0},
 {'name': 'y', 'value': 0.9413355}]

I was trying on the lines like this:

input_list = [{'name': 'x', 'value': 0.6479667110413355}, {'name': 'x', 'value': 1.0},{'name': 'y', 'value': 0.9413355}, {'name': 'y', 'value': 0.9}]

results = defaultdict(list)

for element in input_list:
    for key, value in element.items():
        results[key].append(value)

for item_key, item_value in results.items():
    results[item_key] = max(item_value)

print(results)

Output

defaultdict(<class 'list'>, {'name': 'y', 'value': 1.0})

I am missing something here. Getting only value of y and not x Can someone help on this.

I have a list of dictionary values which has duplicate name values. I would like to remove only the one whose value is less for the same name. For example, I have an input like below.

[{'name': 'x', 'value': 0.6479667110413355},
 {'name': 'x', 'value': 1.0},
 {'name': 'y', 'value': 0.9413355},
 {'name': 'y', 'value': 0.9}]

Hence, compare all the values of x and keep only that dictionary that has the highest values of x. Same goes for y. What I would like to have output is

[{'name': 'x', 'value': 1.0},
 {'name': 'y', 'value': 0.9413355}]

I was trying on the lines like this:

input_list = [{'name': 'x', 'value': 0.6479667110413355}, {'name': 'x', 'value': 1.0},{'name': 'y', 'value': 0.9413355}, {'name': 'y', 'value': 0.9}]

results = defaultdict(list)

for element in input_list:
    for key, value in element.items():
        results[key].append(value)

for item_key, item_value in results.items():
    results[item_key] = max(item_value)

print(results)

Output

defaultdict(<class 'list'>, {'name': 'y', 'value': 1.0})

I am missing something here. Getting only value of y and not x Can someone help on this.

Share Improve this question edited Mar 28 at 11:51 Robert 1132 bronze badges asked Mar 27 at 9:09 anshuk_palanshuk_pal 3152 silver badges8 bronze badges 1
  • @trincot Yes, was trying something like this input_list = [{'name': 'x', 'value': 0.6479667110413355}, {'name': 'x', 'value': 1.0},{'name': 'y', 'value': 0.9413355}, {'name': 'y', 'value': 0.9}] results = defaultdict(list) for element in input_list: for key, value in element.items(): results[key].append(value) for item_key, item_value in results.items(): results[item_key] = max(item_value) print(results) But getting defaultdict(<class 'list'>, {'name': 'y', 'value': 1.0}) – anshuk_pal Commented Mar 27 at 9:23
Add a comment  | 

2 Answers 2

Reset to default 1

A simple approach could be to get your list sorted by value and then turn it into a dictionary keyed by name: that way only the entries with the greatest value will survive:

from operator import itemgetter 

# sample input
lst = [{'name': 'x', 'value': 0.6479667110413355}, {'name': 'x', 'value': 1.0},{'name': 'y', 'value': 0.9413355}, {'name': 'y', 'value': 0.9}]

# create key/value pairs in sorted order so that dicts with greater values overwrite lesser ones
lst = list({ item['name'] : item for item in sorted(lst, key=itemgetter('value')) }.values())

print(lst)

While @trincot's answer works, calling the sorted function results in a time complexity of O(n log n), which can be an issue when the size of the input scales up.

A more efficient approach that keeps the time complexity linear would be to keep track of the largest value of each key in a dict and convert the the dict to the desired list of dicts in the end:

from operator import itemgetter

name_to_value = {}
for name, value in map(itemgetter('name', 'value'), input_list):
    name_to_value[name] = max(value, name_to_value.get(name, value))
print([dict(zip(('name', 'value'), pair)) for pair in name_to_value.items()])

This outputs:

[{'name': 'x', 'value': 1.0}, {'name': 'y', 'value': 0.9413355}]

Demo: https://ideone/Gzwaw6

发布评论

评论列表(0)

  1. 暂无评论