最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Pandas datetime index empty dataframe - Stack Overflow

programmeradmin1浏览0评论

I have the following code:

data = pandas.read_csv('data.csv')
data['when'] = pandas.to_datetime(data['when'])
data.set_index('when', inplace=True)
print(data)
print(data.index.dtype)

which prints:

                 price
when                  
2025-01-04  98259.4300
2025-01-03  98126.6400
2025-01-02  96949.1800
2025-01-01  94610.1400
2024-12-31  93647.0100
...                ...
2010-07-21      0.0792
2010-07-20      0.0747
2010-07-19      0.0808
2010-07-18      0.0858
2010-07-17      0.0500

[5286 rows x 1 columns]
datetime64[ns]

Then, I am trying to select a range like this:

start_date = datetime(year=2010,month=1,day=1)
end_date = datetime(year=2025,month=1,day=1)
print(data.loc[start_date:end_date])
print(data.loc[start_date:])
print(data.loc[:end_date])

and this prints

Empty DataFrame
Columns: [price]
Index: []
Empty DataFrame
Columns: [price]
Index: []
               price
when                
2025-01-04  98259.43
2025-01-03  98126.64
2025-01-02  96949.18
2025-01-01  94610.14

Why?

I am using pandas 2.2.3.

I have the following code:

data = pandas.read_csv('data.csv')
data['when'] = pandas.to_datetime(data['when'])
data.set_index('when', inplace=True)
print(data)
print(data.index.dtype)

which prints:

                 price
when                  
2025-01-04  98259.4300
2025-01-03  98126.6400
2025-01-02  96949.1800
2025-01-01  94610.1400
2024-12-31  93647.0100
...                ...
2010-07-21      0.0792
2010-07-20      0.0747
2010-07-19      0.0808
2010-07-18      0.0858
2010-07-17      0.0500

[5286 rows x 1 columns]
datetime64[ns]

Then, I am trying to select a range like this:

start_date = datetime(year=2010,month=1,day=1)
end_date = datetime(year=2025,month=1,day=1)
print(data.loc[start_date:end_date])
print(data.loc[start_date:])
print(data.loc[:end_date])

and this prints

Empty DataFrame
Columns: [price]
Index: []
Empty DataFrame
Columns: [price]
Index: []
               price
when                
2025-01-04  98259.43
2025-01-03  98126.64
2025-01-02  96949.18
2025-01-01  94610.14

Why?

I am using pandas 2.2.3.

Share Improve this question edited Feb 2 at 13:12 user171780 asked Feb 2 at 12:49 user171780user171780 3,1154 gold badges32 silver badges73 bronze badges 2
  • It'd help to provide a minimal reproducible example, meaning add some example data and your expected output. For specifics, see How to make good reproducible pandas examples. You could probably just copy in ouroboros's MRE. – wjandrea Commented Feb 3 at 4:58
  • Why what? I can totally see how this is confusing, but I'm not sure which part you're confused about specifically. Maybe you want to ask something like "Why is the result empty even though both datetimes [both bounds] occur in the index?" Check out How to Ask for tips on how to write a good title. Working back from your answer, I guess a good title would be "Why can't I select from a datetime index when it's in reverse sorted order?" or "Why can't I select from a datetime index when it's unordered?" depending on whether data.index.is_monotonic_decreasing or not. – wjandrea Commented Feb 3 at 5:18
Add a comment  | 

3 Answers 3

Reset to default 1

df.loc slices by index order, not by chronological date order. I.e., reverse start_date and end_date:

Minimal, Reproducible Example

import pandas as pd
import numpy as np
from datetime import datetime

idx = pd.date_range('2025-01-04', '2010-01-01', freq='-1d', name='when')
data = pd.DataFrame({'price': np.random.default_rng(0).random(len(idx))}, 
                    index=idx)

data.loc[end_date:start_date] # reversed

Result:

               price
when                
2025-01-01  0.016528
2024-12-31  0.813270
2024-12-30  0.912756
2024-12-29  0.606636
2024-12-28  0.729497
             ...
2010-01-05  0.288949
2010-01-04  0.608021
2010-01-03  0.111751
2010-01-02  0.385724
2010-01-01  0.172269

Similarly, you need:

  • data.loc[:start_date] instead of data.loc[start_date:]
  • data.loc[end_date:] instead of data.loc[:end_date]

However, note that df.loc is optimized for a monotonically increasing DatetimeIndex. I.e., this works:

data.sort_index().loc['2025-01-03 23:59':]

               price
when                
2025-01-04  0.636962

Yet the following throws an error:

data.loc[:'2025-01-03 23:59']

KeyError: 'Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is not allowed.'

That's a poorly phrased message, seeing that:

data.index.is_monotonic_decreasing
# True

Our index is monotonic, just not monotonically increasing. As a result, it fails this condition:

        if (
            check_str_or_none(start)
            or check_str_or_none(end)
            or self.is_monotonic_increasing     # False!
        )

Subsequently, the operation errors out a few lines later. You would need:

data.loc[data.index >= '2025-01-03 23:59']

# or: data.truncate(before='2025-01-03 23:59')

Best practice is to work with a monotonically increasing timeseries.

I discovered that for this to work, the index has to be ordered, though I'm not sure whether this is the expected behavior. So adding this before doing the selection fixes the issue:

data.sort_index(inplace=True)

convert start date and end date to date objects by date() method

start_date = datetime(year=2010, month=1, day=1).date()
end_date = datetime(year=2025, month=1, day=1).date()
发布评论

评论列表(0)

  1. 暂无评论