pandas - Trying to create a new dataframe based on group by calculation on an existing dataframe in Python

I have an 18,000 record dataset in the below format:

Date	Tm	Site	Opp	Player	Dist	Made	Blocked	GameID	Season
2024-01-07	ARI	H	SEA	Matt Prater	51	N	N	SEA @ ARI	2023
2024-01-07	DAL	A	WAS	Brandon Aubrey	50	Y	N	DAL @ WAS	2023
2024-01-07	TAM	A	CAR	Chase McLaughlin	57	Y	N	TAM @ CAR	2023
2024-01-07	CAR	H	TAM	Matthew Wright	52	N	N	TAM @ CAR	2023
2024-01-07	CHI	A	GNB	Cairo Santos	50	Y	N	CHI @ GNB	2023

I have an 18,000 record dataset in the below format:

Date	Tm	Site	Opp	Player	Dist	Made	Blocked	GameID	Season
2024-01-07	ARI	H	SEA	Matt Prater	51	N	N	SEA @ ARI	2023
2024-01-07	DAL	A	WAS	Brandon Aubrey	50	Y	N	DAL @ WAS	2023
2024-01-07	TAM	A	CAR	Chase McLaughlin	57	Y	N	TAM @ CAR	2023
2024-01-07	CAR	H	TAM	Matthew Wright	52	N	N	TAM @ CAR	2023
2024-01-07	CHI	A	GNB	Cairo Santos	50	Y	N	CHI @ GNB	2023

There is data for 50 seasons. My goal for this part of my project is to calculate the number of attempts (each line is one attempt) per game (unique GameID) by season. My thought was the best route is to create a dataframe that has columns for season, attempts, games, and average per game.

I've run a calculation for attempts by using:

df.groupby(['Season']).size()

And unique games by using:

df.groupby('Season')['GameID'].nunique()

Each of these brings back a table by year, so I was thinking that I could create a dictionary with the three fields to build a new dataframe.

data = {"Year":df.groupby(['Season']), "FG":df.groupby(['Season']).size(), "Games":df.groupby('Season')['GameID'].nunique()}
dfgrp = pd.DataFrame(data)

But I get a very long error when I try to view dfgrp, where it stops iteration but doesn't identify what the issue is.

I've tried looking through multiple searches but there doesn't seem to be a matching question that addresses this issue. Am I going about this the wrong way?

Share Improve this question asked Nov 15, 2024 at 21:39 Abartel 274 bronze badges

1 Not sure exactly what you are trying to do. Something like this? out = df.groupby(['Season'], as_index=False).agg(FG=('Season', 'size'), Games=('GameID', 'nunique')) if you add what you expect the output to be, it would be easier to help – iBeMeltin Commented Nov 15, 2024 at 21:55
I am also not sure what you want to do. Why don't you groupby(by=['Season', 'GameID']).nunique()? From your post I assume that you need information about each GameID in the season, which are lost using .agg(). – yellow_dot Commented Nov 16, 2024 at 15:33

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

You could skip a few steps with pd.groupby.agg().

df.groupby('Season').agg(size=('Season', 'size'),
                         nunique=('GameID', 'nunique'))

        size    nunique
Season      
  2023     5          4

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

pandas - Trying to create a new dataframe based on group by calculation on an existing dataframe in Python - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)