最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Writing back to a panda groupby group - Stack Overflow

programmeradmin0浏览0评论

Good morning all

I am trying to process a lot of data, and I need to group data, look at the group, then set a value based on the other entries in the group, but I want to set the value in a column in the full dataset. What I can't figure out is how I can use the group to write back to the main dataframe.

So as an example, I created this data frame

import pandas as pd
data = [{
    "class": "cat",
    "name": "Fluffy",
    "age": 3,
    "child": "Whiskers",
    "parents_in_group": ""
}, {
    "class": "dog",
    "name": "Spot",
    "age": 5
}, {
    "class": "cat",
    "name": "Whiskers",
    "age": 7
}, {
    "class": "dog",
    "name": "Rover",
    "age": 2,
    "child": "Spot"
}]
df = pd.DataFrame(data)
df

So as an example, lets say that I want to set the parrents_in_group to a list of all the parrents in the group, easy to do

for name, group in group_by_class:
  mask = group["child"].notna()
  print("This is the parrent in group")
  print(group[mask])
  parent_name = group[mask]["name"].values[0]
  print(f"This is the parent name: {parent_name}")
  group["parents_in_group"] = parent_name
  print("And now we have the name set in group")
  print(group)

That updates the group, but not the actual data frame. So how would I go about writing this information back to the main data frame

Using the name and search

This works, but seems a bit untidy

for name, group in group_by_class:
    mask = group["child"].notna()
    parent_name = group[mask]["name"].values[0]
    df.loc[df['class'] == name, 'parents_in_group'] = parent_name
    
df

Using group

How would I go about using group to set the values, rather than searching for the name that the group was created by. Or are there better ways to going about it.

The real challenge I'm having is that I need to get the group, find some specific values in the group, then set some fields based on the data found.

Any help of course welcome.

Good morning all

I am trying to process a lot of data, and I need to group data, look at the group, then set a value based on the other entries in the group, but I want to set the value in a column in the full dataset. What I can't figure out is how I can use the group to write back to the main dataframe.

So as an example, I created this data frame

import pandas as pd
data = [{
    "class": "cat",
    "name": "Fluffy",
    "age": 3,
    "child": "Whiskers",
    "parents_in_group": ""
}, {
    "class": "dog",
    "name": "Spot",
    "age": 5
}, {
    "class": "cat",
    "name": "Whiskers",
    "age": 7
}, {
    "class": "dog",
    "name": "Rover",
    "age": 2,
    "child": "Spot"
}]
df = pd.DataFrame(data)
df

So as an example, lets say that I want to set the parrents_in_group to a list of all the parrents in the group, easy to do

for name, group in group_by_class:
  mask = group["child"].notna()
  print("This is the parrent in group")
  print(group[mask])
  parent_name = group[mask]["name"].values[0]
  print(f"This is the parent name: {parent_name}")
  group["parents_in_group"] = parent_name
  print("And now we have the name set in group")
  print(group)

That updates the group, but not the actual data frame. So how would I go about writing this information back to the main data frame

Using the name and search

This works, but seems a bit untidy

for name, group in group_by_class:
    mask = group["child"].notna()
    parent_name = group[mask]["name"].values[0]
    df.loc[df['class'] == name, 'parents_in_group'] = parent_name
    
df

Using group

How would I go about using group to set the values, rather than searching for the name that the group was created by. Or are there better ways to going about it.

The real challenge I'm having is that I need to get the group, find some specific values in the group, then set some fields based on the data found.

Any help of course welcome.

Share Improve this question asked Feb 3 at 10:40 vrghostvrghost 1,2242 gold badges20 silver badges44 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

A loop-less approach would be to compute a groupby.first after dropna, then to map the output:

df['parents_in_group'] = df['class'].map(
    df.dropna(subset='child').groupby('class')['name'].first()
)

# variant
df['parents_in_group'] = df['class'].map(
    df['name'].where(df['child'].notna()).groupby(df['class']).first()
)

Or, with drop_duplicates in place of groupby (for efficiency):

df['parents_in_group'] = df['class'].map(
    df.dropna(subset='child')
      .drop_duplicates(subset='class')
      .set_index('class')['name']
)

Output:

  class      name  age     child parents_in_group
0   cat    Fluffy    3  Whiskers           Fluffy
1   dog      Spot    5       NaN            Rover
2   cat  Whiskers    7       NaN           Fluffy
3   dog     Rover    2      Spot            Rover

Or, if efficiency doesn't really matter with a groupy.apply:

out = (df.groupby('class', sort=False, group_keys=False)
         .apply(lambda x: x.assign(parents_in_group=x.loc[x['child'].notna(),
                                                          'name']
                                                     .iloc[:1].squeeze()),
               include_groups=False)
      )

Figured it out trying to write this, but if anyone have the same issue, the key is in the group.index, and using .loc not .iloc

for name, group in group_by_class:
    mask = group["child"].notna()
    parent_name = group[mask]["name"].values[0]
    print(group.index)
    df.loc[group.index, 'parents_in_group'] = parent_name
    

    
df
发布评论

评论列表(0)

  1. 暂无评论