I'd like some help understanding this behavior:
import pandas as pd
pd.date_range("2016-09-01", "2006-03-01", freq="-6MS", inclusive="left")
This returns:
DatetimeIndex(['2016-09-01', '2016-03-01', '2015-09-01', '2015-03-01',
'2014-09-01', '2014-03-01', '2013-09-01', '2013-03-01',
'2012-09-01', '2012-03-01', '2011-09-01', '2011-03-01',
'2010-09-01', '2010-03-01', '2009-09-01', '2009-03-01',
'2008-09-01', '2008-03-01', '2007-09-01', '2007-03-01',
'2006-09-01'],
dtype='datetime64[ns]', freq='-6MS')
Note that here '2006-03-01'
is missing.
When I move the end date forward to 2006-03-02
... 2006-03-01
now IS included:
import pandas as pd
pd.date_range("2016-09-01", "2006-03-02", freq="-6MS", inclusive="left")
Returns:
DatetimeIndex(['2016-09-01', '2016-03-01', '2015-09-01', '2015-03-01',
'2014-09-01', '2014-03-01', '2013-09-01', '2013-03-01',
'2012-09-01', '2012-03-01', '2011-09-01', '2011-03-01',
'2010-09-01', '2010-03-01', '2009-09-01', '2009-03-01',
'2008-09-01', '2008-03-01', '2007-09-01', '2007-03-01',
'2006-09-01', '2006-03-01'],
dtype='datetime64[ns]', freq='-6MS')
I expected 2006-03-01
to be excluded and the result to be the same result in both cases, why is this happening? It's counting backwards from 2016-09-01 by 6 month intervals, so it shouldn't include 2006-03-01 when the last date is set to a value greater than that (e.g. 2006-03-02), right?
I'd like some help understanding this behavior:
import pandas as pd
pd.date_range("2016-09-01", "2006-03-01", freq="-6MS", inclusive="left")
This returns:
DatetimeIndex(['2016-09-01', '2016-03-01', '2015-09-01', '2015-03-01',
'2014-09-01', '2014-03-01', '2013-09-01', '2013-03-01',
'2012-09-01', '2012-03-01', '2011-09-01', '2011-03-01',
'2010-09-01', '2010-03-01', '2009-09-01', '2009-03-01',
'2008-09-01', '2008-03-01', '2007-09-01', '2007-03-01',
'2006-09-01'],
dtype='datetime64[ns]', freq='-6MS')
Note that here '2006-03-01'
is missing.
When I move the end date forward to 2006-03-02
... 2006-03-01
now IS included:
import pandas as pd
pd.date_range("2016-09-01", "2006-03-02", freq="-6MS", inclusive="left")
Returns:
DatetimeIndex(['2016-09-01', '2016-03-01', '2015-09-01', '2015-03-01',
'2014-09-01', '2014-03-01', '2013-09-01', '2013-03-01',
'2012-09-01', '2012-03-01', '2011-09-01', '2011-03-01',
'2010-09-01', '2010-03-01', '2009-09-01', '2009-03-01',
'2008-09-01', '2008-03-01', '2007-09-01', '2007-03-01',
'2006-09-01', '2006-03-01'],
dtype='datetime64[ns]', freq='-6MS')
I expected 2006-03-01
to be excluded and the result to be the same result in both cases, why is this happening? It's counting backwards from 2016-09-01 by 6 month intervals, so it shouldn't include 2006-03-01 when the last date is set to a value greater than that (e.g. 2006-03-02), right?
- Please make sure the title matches the body; I see I didn't capture accurately your intent in the staging area :). – AD7six Commented Apr 1 at 8:40
2 Answers
Reset to default 2In this line
pd.date_range("2016-09-01", "2006-03-01", freq="-6MS", inclusive="left") date 2006-03-01
the date 2006-03-01
is excluded because inclusive='left'
omits the end date if it falls on the boundary, as stated on https://pandas.pydata./docs/reference/api/pandas.date_range.html.
When the previous date is 2006-09-01, the ending date, after adding a 6-month interval, is 2006-03-01. It lands exactly on that date "2006-03-01".
However, dates within the same landing month are included as long as the end date is later than the generated date. Specifically, 2006-03-01
will be included if the end date falls between 2006-03-02
and 2006-03-31
.
This code will inlcude 2006-03-01 :
import pandas as pd
dates = pd.date_range("2016-09-01", "2006-03-31", freq="-6MS", inclusive="left")
print(dates)
However, if the ending date is in the fourth month, the date 2006-03-01 will not be included:
import pandas as pd
dates = pd.date_range("2016-09-01", "2006-04-01", freq="-6MS", inclusive="left")
print(dates)
To explicitly include 2006-03-01
, you may use inclusive="both"
, which ensures that both the start and end dates are included. Alternatively, omitting the inclusive
parameter defaults to "both"
, achieving the same result.
It's because you're setting the inclusive
argument to left
. Set it to both
or don't set it as both
is the default.
pd.date_range("2016-09-01", "2006-03-01", freq="-6MS")
DatetimeIndex(['2016-09-01', '2016-03-01', '2015-09-01', '2015-03-01',
'2014-09-01', '2014-03-01', '2013-09-01', '2013-03-01',
'2012-09-01', '2012-03-01', '2011-09-01', '2011-03-01',
'2010-09-01', '2010-03-01', '2009-09-01', '2009-03-01',
'2008-09-01', '2008-03-01', '2007-09-01', '2007-03-01',
'2006-09-01', '2006-03-01'],
dtype='datetime64[ns]', freq='-6MS')