最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - If I change a Pandas Periodindex into a String valued index, what am I giving up? - Stack Overflow

programmeradmin3浏览0评论

A disclaimer at the start: I am not really a programmer but do have some ability to hack some things together and get a reasonable result. But my knowledge is very much a case of— get a solution to the problem at hand, find the tree but not really take a look at the forest.

I'm trying to put together a database for playing with ML and AI models in python. The data has heterogeneous frequencies:

  • Stock prices and interest rates with daily data
  • Interest rates and some other macro economic data at monthly frequencies
  • Things like GDP and GNI and some other production related data at quarterly frequency

Given that some of the work I need to do is related to trying to forecast estimates for things like quarterly earnings or other financial data elements from balance sheets or income statements, and macro data like GNP, that means classifying daily, weekly or monthly data into a quarterly bucket is needed.

I was starting to use the Pandas Periodindex function and that did what I wanted. But if I want to do something like filter a dataframe for only Q1 data for all the years turns out it is not as simple as "where 'Q1' in df.index, do something". Looks like I can do what I want by jumping though some additional steps, but I'm thinking of simply making the index a string value rather than a Period value (just using the .astype(str) when using the Periodindex() call).

Would I be giving up some functionality I'm not seeing at the moment but might really like having access to by changing the index?

Below is a snippet of what I'm talking about:

...
qBalSh.index = pd.PeriodIndex(qBalSh.fiscalDateEnding, freq='Q').astype(str)
qBalSh.index.name = 'Quarter'
qIncSt.index = pd.PeriodIndex(qIncSt.fiscalDateEnding, freq='Q')
qIncSt.index.name = 'Quarter'
...

A disclaimer at the start: I am not really a programmer but do have some ability to hack some things together and get a reasonable result. But my knowledge is very much a case of— get a solution to the problem at hand, find the tree but not really take a look at the forest.

I'm trying to put together a database for playing with ML and AI models in python. The data has heterogeneous frequencies:

  • Stock prices and interest rates with daily data
  • Interest rates and some other macro economic data at monthly frequencies
  • Things like GDP and GNI and some other production related data at quarterly frequency

Given that some of the work I need to do is related to trying to forecast estimates for things like quarterly earnings or other financial data elements from balance sheets or income statements, and macro data like GNP, that means classifying daily, weekly or monthly data into a quarterly bucket is needed.

I was starting to use the Pandas Periodindex function and that did what I wanted. But if I want to do something like filter a dataframe for only Q1 data for all the years turns out it is not as simple as "where 'Q1' in df.index, do something". Looks like I can do what I want by jumping though some additional steps, but I'm thinking of simply making the index a string value rather than a Period value (just using the .astype(str) when using the Periodindex() call).

Would I be giving up some functionality I'm not seeing at the moment but might really like having access to by changing the index?

Below is a snippet of what I'm talking about:

...
qBalSh.index = pd.PeriodIndex(qBalSh.fiscalDateEnding, freq='Q').astype(str)
qBalSh.index.name = 'Quarter'
qIncSt.index = pd.PeriodIndex(qIncSt.fiscalDateEnding, freq='Q')
qIncSt.index.name = 'Quarter'
...
Share Improve this question edited yesterday wjandrea 33k9 gold badges69 silver badges97 bronze badges asked yesterday J HJ H 113 bronze badges 1
  • Related to the base problem: Is there a function to determine which quarter of the year a date is in? chishaku's answer covers Pandas. – wjandrea Commented yesterday
Add a comment  | 

1 Answer 1

Reset to default 0

PeriodIndex has time-specific features. See the user guide: Time series / date functionality

In any case, you don't need to set the index as string, since:

A) PeriodIndex provides a .quarter attribute

qIncSt[qIncSt.index.quarter == 1]

B) Even if it didn't, you can always do .astype(str) on-demand and use that for filtering.

qIncSt[qIncSt.index.astype(str).str.contains('Q1')]

Example

months = ["2023-01", "2023-02", "2023-03", "2023-04", "2023-05"]
qIncSt = pd.DataFrame({
    'fiscalDateEnding': pd.to_datetime(months),
    'foobar': range(len(months))})
qIncSt.index = pd.PeriodIndex(
    qIncSt['fiscalDateEnding'],
    freq='Q',
    name='Quarter')
>>> qIncSt
        fiscalDateEnding  foobar
Quarter                         
2023Q1        2023-01-01       0
2023Q1        2023-02-01       1
2023Q1        2023-03-01       2
2023Q2        2023-04-01       3
2023Q2        2023-05-01       4

>>> qIncSt[qIncSt.index.quarter == 1]
        fiscalDateEnding  foobar
Quarter                         
2023Q1        2023-01-01       0
2023Q1        2023-02-01       1
2023Q1        2023-03-01       2

>>> qIncSt[qIncSt.index.astype(str).str.contains('Q1')]
        fiscalDateEnding  foobar
Quarter                         
2023Q1        2023-01-01       0
2023Q1        2023-02-01       1
2023Q1        2023-03-01       2

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论