最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pandas - Trying to create a new dataframe based on group by calculation on an existing dataframe in Python - Stack Overflow

programmeradmin1浏览0评论

I have an 18,000 record dataset in the below format:

Date Tm Site Opp Player Dist Made Blocked GameID Season
2024-01-07 ARI H SEA Matt Prater 51 N N SEA @ ARI 2023
2024-01-07 DAL A WAS Brandon Aubrey 50 Y N DAL @ WAS 2023
2024-01-07 TAM A CAR Chase McLaughlin 57 Y N TAM @ CAR 2023
2024-01-07 CAR H TAM Matthew Wright 52 N N TAM @ CAR 2023
2024-01-07 CHI A GNB Cairo Santos 50 Y N CHI @ GNB 2023

I have an 18,000 record dataset in the below format:

Date Tm Site Opp Player Dist Made Blocked GameID Season
2024-01-07 ARI H SEA Matt Prater 51 N N SEA @ ARI 2023
2024-01-07 DAL A WAS Brandon Aubrey 50 Y N DAL @ WAS 2023
2024-01-07 TAM A CAR Chase McLaughlin 57 Y N TAM @ CAR 2023
2024-01-07 CAR H TAM Matthew Wright 52 N N TAM @ CAR 2023
2024-01-07 CHI A GNB Cairo Santos 50 Y N CHI @ GNB 2023

There is data for 50 seasons. My goal for this part of my project is to calculate the number of attempts (each line is one attempt) per game (unique GameID) by season. My thought was the best route is to create a dataframe that has columns for season, attempts, games, and average per game.

I've run a calculation for attempts by using:

df.groupby(['Season']).size()

And unique games by using:

df.groupby('Season')['GameID'].nunique()

Each of these brings back a table by year, so I was thinking that I could create a dictionary with the three fields to build a new dataframe.

data = {"Year":df.groupby(['Season']), "FG":df.groupby(['Season']).size(), "Games":df.groupby('Season')['GameID'].nunique()}
dfgrp = pd.DataFrame(data)

But I get a very long error when I try to view dfgrp, where it stops iteration but doesn't identify what the issue is.

I've tried looking through multiple searches but there doesn't seem to be a matching question that addresses this issue. Am I going about this the wrong way?

Share Improve this question asked Nov 15, 2024 at 21:39 AbartelAbartel 274 bronze badges 2
  • 1 Not sure exactly what you are trying to do. Something like this? out = df.groupby(['Season'], as_index=False).agg(FG=('Season', 'size'), Games=('GameID', 'nunique')) if you add what you expect the output to be, it would be easier to help – iBeMeltin Commented Nov 15, 2024 at 21:55
  • I am also not sure what you want to do. Why don't you groupby(by=['Season', 'GameID']).nunique()? From your post I assume that you need information about each GameID in the season, which are lost using .agg(). – yellow_dot Commented Nov 16, 2024 at 15:33
Add a comment  | 

1 Answer 1

Reset to default 0

You could skip a few steps with pd.groupby.agg().

df.groupby('Season').agg(size=('Season', 'size'),
                         nunique=('GameID', 'nunique'))

        size    nunique
Season      
  2023     5          4

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论