I have the following code:
data = pandas.read_csv('data.csv')
data['when'] = pandas.to_datetime(data['when'])
data.set_index('when', inplace=True)
print(data)
print(data.index.dtype)
which prints:
price
when
2025-01-04 98259.4300
2025-01-03 98126.6400
2025-01-02 96949.1800
2025-01-01 94610.1400
2024-12-31 93647.0100
... ...
2010-07-21 0.0792
2010-07-20 0.0747
2010-07-19 0.0808
2010-07-18 0.0858
2010-07-17 0.0500
[5286 rows x 1 columns]
datetime64[ns]
Then, I am trying to select a range like this:
start_date = datetime(year=2010,month=1,day=1)
end_date = datetime(year=2025,month=1,day=1)
print(data.loc[start_date:end_date])
print(data.loc[start_date:])
print(data.loc[:end_date])
and this prints
Empty DataFrame
Columns: [price]
Index: []
Empty DataFrame
Columns: [price]
Index: []
price
when
2025-01-04 98259.43
2025-01-03 98126.64
2025-01-02 96949.18
2025-01-01 94610.14
Why?
I am using pandas 2.2.3.
I have the following code:
data = pandas.read_csv('data.csv')
data['when'] = pandas.to_datetime(data['when'])
data.set_index('when', inplace=True)
print(data)
print(data.index.dtype)
which prints:
price
when
2025-01-04 98259.4300
2025-01-03 98126.6400
2025-01-02 96949.1800
2025-01-01 94610.1400
2024-12-31 93647.0100
... ...
2010-07-21 0.0792
2010-07-20 0.0747
2010-07-19 0.0808
2010-07-18 0.0858
2010-07-17 0.0500
[5286 rows x 1 columns]
datetime64[ns]
Then, I am trying to select a range like this:
start_date = datetime(year=2010,month=1,day=1)
end_date = datetime(year=2025,month=1,day=1)
print(data.loc[start_date:end_date])
print(data.loc[start_date:])
print(data.loc[:end_date])
and this prints
Empty DataFrame
Columns: [price]
Index: []
Empty DataFrame
Columns: [price]
Index: []
price
when
2025-01-04 98259.43
2025-01-03 98126.64
2025-01-02 96949.18
2025-01-01 94610.14
Why?
I am using pandas 2.2.3.
Share Improve this question edited Feb 2 at 13:12 user171780 asked Feb 2 at 12:49 user171780user171780 3,1154 gold badges32 silver badges73 bronze badges 2 |3 Answers
Reset to default 1df.loc
slices by index order, not by chronological date order. I.e., reverse start_date
and end_date
:
Minimal, Reproducible Example
import pandas as pd
import numpy as np
from datetime import datetime
idx = pd.date_range('2025-01-04', '2010-01-01', freq='-1d', name='when')
data = pd.DataFrame({'price': np.random.default_rng(0).random(len(idx))},
index=idx)
data.loc[end_date:start_date] # reversed
Result:
price
when
2025-01-01 0.016528
2024-12-31 0.813270
2024-12-30 0.912756
2024-12-29 0.606636
2024-12-28 0.729497
...
2010-01-05 0.288949
2010-01-04 0.608021
2010-01-03 0.111751
2010-01-02 0.385724
2010-01-01 0.172269
Similarly, you need:
data.loc[:start_date]
instead ofdata.loc[start_date:]
data.loc[end_date:]
instead ofdata.loc[:end_date]
However, note that df.loc
is optimized for a monotonically increasing DatetimeIndex
.
I.e., this works:
data.sort_index().loc['2025-01-03 23:59':]
price
when
2025-01-04 0.636962
Yet the following throws an error:
data.loc[:'2025-01-03 23:59']
KeyError: 'Value based partial slicing on non-monotonic DatetimeIndexes with non-existing keys is not allowed.'
That's a poorly phrased message, seeing that:
data.index.is_monotonic_decreasing
# True
Our index is monotonic, just not monotonically increasing. As a result, it fails this condition:
if (
check_str_or_none(start)
or check_str_or_none(end)
or self.is_monotonic_increasing # False!
)
Subsequently, the operation errors out a few lines later. You would need:
data.loc[data.index >= '2025-01-03 23:59']
# or: data.truncate(before='2025-01-03 23:59')
Best practice is to work with a monotonically increasing timeseries.
I discovered that for this to work, the index has to be ordered, though I'm not sure whether this is the expected behavior. So adding this before doing the selection fixes the issue:
data.sort_index(inplace=True)
convert start date and end date to date objects by date() method
start_date = datetime(year=2010, month=1, day=1).date()
end_date = datetime(year=2025, month=1, day=1).date()
data.index.is_monotonic_decreasing
or not. – wjandrea Commented Feb 3 at 5:18