Dataframe
Here's a dataframe which has U.S. Treasury General Account deposits from taxes (month to date).
>>> df
record_date transaction_mtd_amt
0 2005-10-03 18777
1 2005-10-04 21586
2 2005-10-05 29910
3 2005-10-06 32291
4 2005-10-07 37696
... ... ...
4892 2025-03-26 373897
4893 2025-03-27 381036
4894 2025-03-28 395097
4895 2025-03-31 429273
4896 2025-04-01 28706
[4897 rows x 2 columns]
Chart
Here's a simple Python program that puts up the chart in streamlit.
import pandas as pd
import streamlit as st
import plotly.express as px
url = '.csv'
@st.cache_data
def load_data():
return pd.read_csv(url)
df = load_data()
fig = px.line(df, x='record_date', y='transaction_mtd_amt')
st.plotly_chart(fig)
Year comparison
I'd like to compare years and show them side-by-side on the chart.
Here's one approach:
import pandas as pd
import streamlit as st
import plotly.express as px
url = '.csv'
@st.cache_data
def load_data():
return pd.read_csv(url)
df = load_data()
df['record_date'] = pd.to_datetime(df['record_date'])
tbl = df.copy()
# ----------------------------------------------------------------------
tbl['year'] = tbl['record_date'].dt.year
tbl['record_date_'] = tbl['record_date'].apply(lambda x: x.replace(year=2000))
# ----------------------------------------------------------------------
fig = px.line(tbl, x='record_date_', y='transaction_mtd_amt', color='year')
st.plotly_chart(fig)
Here's what the dataframe looks like:
>>> tbl
record_date transaction_mtd_amt year record_date_
0 2005-10-03 18777 2005 2000-10-03
1 2005-10-04 21586 2005 2000-10-04
2 2005-10-05 29910 2005 2000-10-05
3 2005-10-06 32291 2005 2000-10-06
4 2005-10-07 37696 2005 2000-10-07
... ... ... ... ...
4892 2025-03-26 373897 2025 2000-03-26
4893 2025-03-27 381036 2025 2000-03-27
4894 2025-03-28 395097 2025 2000-03-28
4895 2025-03-31 429273 2025 2000-03-31
4896 2025-04-01 28706 2025 2000-04-01
[4897 rows x 4 columns]
Here's what the chart looks like:
Isolated to just 2024 and 2025:
Approach
The two key lines in implementing this approach are these:
tbl['year'] = tbl['record_date'].dt.year
tbl['record_date_'] = tbl['record_date'].apply(lambda x: x.replace(year=2000))
Questions
While this seems to work, I'm wondering if this is the idiomatic and recommended way to go about this in pandas.
Note that in the year comparison chart, I arbitrarily picked the year 2000 as the base year. This is a little bit awkward as this is just an arbitrary date. Is there a way to implement this without showing the year in the chart?