最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Year comparison - Stack Overflow

programmeradmin1浏览0评论

Dataframe

Here's a dataframe which has U.S. Treasury General Account deposits from taxes (month to date).

>>> df
     record_date  transaction_mtd_amt
0     2005-10-03                18777
1     2005-10-04                21586
2     2005-10-05                29910
3     2005-10-06                32291
4     2005-10-07                37696
...          ...                  ...
4892  2025-03-26               373897
4893  2025-03-27               381036
4894  2025-03-28               395097
4895  2025-03-31               429273
4896  2025-04-01                28706

[4897 rows x 2 columns]

Chart

Here's a simple Python program that puts up the chart in streamlit.

import pandas as pd
import streamlit as st
import plotly.express as px

url = '.csv'

@st.cache_data
def load_data():
    return pd.read_csv(url)

df = load_data()

fig = px.line(df, x='record_date', y='transaction_mtd_amt')

st.plotly_chart(fig)

Year comparison

I'd like to compare years and show them side-by-side on the chart.

Here's one approach:

import pandas as pd
import streamlit as st
import plotly.express as px

url = '.csv'

@st.cache_data
def load_data():
    return pd.read_csv(url)

df = load_data()

df['record_date'] = pd.to_datetime(df['record_date'])

tbl = df.copy()
# ----------------------------------------------------------------------
tbl['year'] = tbl['record_date'].dt.year

tbl['record_date_'] = tbl['record_date'].apply(lambda x: x.replace(year=2000))
# ----------------------------------------------------------------------
fig = px.line(tbl, x='record_date_', y='transaction_mtd_amt', color='year')

st.plotly_chart(fig)

Here's what the dataframe looks like:

>>> tbl
     record_date  transaction_mtd_amt  year record_date_
0     2005-10-03                18777  2005   2000-10-03
1     2005-10-04                21586  2005   2000-10-04
2     2005-10-05                29910  2005   2000-10-05
3     2005-10-06                32291  2005   2000-10-06
4     2005-10-07                37696  2005   2000-10-07
...          ...                  ...   ...          ...
4892  2025-03-26               373897  2025   2000-03-26
4893  2025-03-27               381036  2025   2000-03-27
4894  2025-03-28               395097  2025   2000-03-28
4895  2025-03-31               429273  2025   2000-03-31
4896  2025-04-01                28706  2025   2000-04-01

[4897 rows x 4 columns]

Here's what the chart looks like:

Isolated to just 2024 and 2025:

Approach

The two key lines in implementing this approach are these:

tbl['year'] = tbl['record_date'].dt.year

tbl['record_date_'] = tbl['record_date'].apply(lambda x: x.replace(year=2000))

Questions

While this seems to work, I'm wondering if this is the idiomatic and recommended way to go about this in pandas.

Note that in the year comparison chart, I arbitrarily picked the year 2000 as the base year. This is a little bit awkward as this is just an arbitrary date. Is there a way to implement this without showing the year in the chart?

发布评论

评论列表(0)

  1. 暂无评论