A disclaimer at the start: I am not really a programmer but do have some ability to hack some things together and get a reasonable result. But my knowledge is very much a case of— get a solution to the problem at hand, find the tree but not really take a look at the forest.
I'm trying to put together a database for playing with ML and AI models in python. The data has heterogeneous frequencies:
- Stock prices and interest rates with daily data
- Interest rates and some other macro economic data at monthly frequencies
- Things like GDP and GNI and some other production related data at quarterly frequency
Given that some of the work I need to do is related to trying to forecast estimates for things like quarterly earnings or other financial data elements from balance sheets or income statements, and macro data like GNP, that means classifying daily, weekly or monthly data into a quarterly bucket is needed.
I was starting to use the Pandas Periodindex function and that did what I wanted. But if I want to do something like filter a dataframe for only Q1 data for all the years turns out it is not as simple as "where 'Q1' in df.index
, do something". Looks like I can do what I want by jumping though some additional steps, but I'm thinking of simply making the index a string value rather than a Period value (just using the .astype(str)
when using the Periodindex()
call).
Would I be giving up some functionality I'm not seeing at the moment but might really like having access to by changing the index?
Below is a snippet of what I'm talking about:
...
qBalSh.index = pd.PeriodIndex(qBalSh.fiscalDateEnding, freq='Q').astype(str)
qBalSh.index.name = 'Quarter'
qIncSt.index = pd.PeriodIndex(qIncSt.fiscalDateEnding, freq='Q')
qIncSt.index.name = 'Quarter'
...
A disclaimer at the start: I am not really a programmer but do have some ability to hack some things together and get a reasonable result. But my knowledge is very much a case of— get a solution to the problem at hand, find the tree but not really take a look at the forest.
I'm trying to put together a database for playing with ML and AI models in python. The data has heterogeneous frequencies:
- Stock prices and interest rates with daily data
- Interest rates and some other macro economic data at monthly frequencies
- Things like GDP and GNI and some other production related data at quarterly frequency
Given that some of the work I need to do is related to trying to forecast estimates for things like quarterly earnings or other financial data elements from balance sheets or income statements, and macro data like GNP, that means classifying daily, weekly or monthly data into a quarterly bucket is needed.
I was starting to use the Pandas Periodindex function and that did what I wanted. But if I want to do something like filter a dataframe for only Q1 data for all the years turns out it is not as simple as "where 'Q1' in df.index
, do something". Looks like I can do what I want by jumping though some additional steps, but I'm thinking of simply making the index a string value rather than a Period value (just using the .astype(str)
when using the Periodindex()
call).
Would I be giving up some functionality I'm not seeing at the moment but might really like having access to by changing the index?
Below is a snippet of what I'm talking about:
...
qBalSh.index = pd.PeriodIndex(qBalSh.fiscalDateEnding, freq='Q').astype(str)
qBalSh.index.name = 'Quarter'
qIncSt.index = pd.PeriodIndex(qIncSt.fiscalDateEnding, freq='Q')
qIncSt.index.name = 'Quarter'
...
Share
Improve this question
edited yesterday
wjandrea
33k9 gold badges69 silver badges97 bronze badges
asked yesterday
J HJ H
113 bronze badges
1
- Related to the base problem: Is there a function to determine which quarter of the year a date is in? chishaku's answer covers Pandas. – wjandrea Commented yesterday
1 Answer
Reset to default 0PeriodIndex
has time-specific features. See the user guide: Time series / date functionality
In any case, you don't need to set the index as string, since:
A) PeriodIndex
provides a .quarter
attribute
qIncSt[qIncSt.index.quarter == 1]
B) Even if it didn't, you can always do .astype(str)
on-demand and use that for filtering.
qIncSt[qIncSt.index.astype(str).str.contains('Q1')]
Example
months = ["2023-01", "2023-02", "2023-03", "2023-04", "2023-05"]
qIncSt = pd.DataFrame({
'fiscalDateEnding': pd.to_datetime(months),
'foobar': range(len(months))})
qIncSt.index = pd.PeriodIndex(
qIncSt['fiscalDateEnding'],
freq='Q',
name='Quarter')
>>> qIncSt
fiscalDateEnding foobar
Quarter
2023Q1 2023-01-01 0
2023Q1 2023-02-01 1
2023Q1 2023-03-01 2
2023Q2 2023-04-01 3
2023Q2 2023-05-01 4
>>> qIncSt[qIncSt.index.quarter == 1]
fiscalDateEnding foobar
Quarter
2023Q1 2023-01-01 0
2023Q1 2023-02-01 1
2023Q1 2023-03-01 2
>>> qIncSt[qIncSt.index.astype(str).str.contains('Q1')]
fiscalDateEnding foobar
Quarter
2023Q1 2023-01-01 0
2023Q1 2023-02-01 1
2023Q1 2023-03-01 2