I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.
When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:
import numpy as np
import pandas as pd
import datetime
a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']
2025-12-02 17:39:06 60.0
2025-12-02 17:39:14 68.0
2025-12-02 17:39:22 76.0
2025-12-02 17:39:30 84.0
2025-12-02 17:39:38 92.0
2025-12-02 17:39:46 100.0
2025-12-02 17:39:54 108.0
2025-12-02 17:40:02 116.0
2025-12-02 17:40:10 124.0
Freq: 8s, Name: Hi, dtype: float64
When using resample interpolate, this is the result:
interval8df.resample('10s').interpolate(method='time')['Hi']
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 94.0
2025-12-02 17:39:50 104.0
2025-12-02 17:40:00 114.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?
I've tried using mean, but that produced no NaNs.
interval8df.resample('10s').mean()['Hi']
2025-12-02 17:39:00 60.0
2025-12-02 17:39:10 68.0
2025-12-02 17:39:20 76.0
2025-12-02 17:39:30 88.0
2025-12-02 17:39:40 100.0
2025-12-02 17:39:50 108.0
2025-12-02 17:40:00 116.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
Additionally, changing the interpolate method does not seem to have improved the solution.
The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.
I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.
When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:
import numpy as np
import pandas as pd
import datetime
a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']
2025-12-02 17:39:06 60.0
2025-12-02 17:39:14 68.0
2025-12-02 17:39:22 76.0
2025-12-02 17:39:30 84.0
2025-12-02 17:39:38 92.0
2025-12-02 17:39:46 100.0
2025-12-02 17:39:54 108.0
2025-12-02 17:40:02 116.0
2025-12-02 17:40:10 124.0
Freq: 8s, Name: Hi, dtype: float64
When using resample interpolate, this is the result:
interval8df.resample('10s').interpolate(method='time')['Hi']
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 94.0
2025-12-02 17:39:50 104.0
2025-12-02 17:40:00 114.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?
I've tried using mean, but that produced no NaNs.
interval8df.resample('10s').mean()['Hi']
2025-12-02 17:39:00 60.0
2025-12-02 17:39:10 68.0
2025-12-02 17:39:20 76.0
2025-12-02 17:39:30 88.0
2025-12-02 17:39:40 100.0
2025-12-02 17:39:50 108.0
2025-12-02 17:40:00 116.0
2025-12-02 17:40:10 124.0
Freq: 10s, Name: Hi, dtype: float64
Additionally, changing the interpolate method does not seem to have improved the solution.
The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.
Share Improve this question edited Mar 30 at 15:56 halfer 20.4k19 gold badges109 silver badges202 bronze badges asked Mar 29 at 21:59 user30106177user30106177 132 bronze badges 2 |1 Answer
Reset to default 0To see what is happening, let's add asfreq
after the resample and you can see what is passed in to the next chained function:
interval8df.resample('10s').asfreq()
Output:
Hi
2025-12-02 17:39:00 NaN
2025-12-02 17:39:10 NaN
2025-12-02 17:39:20 NaN
2025-12-02 17:39:30 84.0
2025-12-02 17:39:40 NaN
2025-12-02 17:39:50 NaN
2025-12-02 17:40:00 NaN
2025-12-02 17:40:10 124.0
And, since you doing interpolation, the lower bound is not seen hence the nulls for seconds 00, 10, 20. While doing mean
with out interpolating you, are just doing a window of 10s means of values. Since you have values within each 10s interval you are getting that mean values returned.
resample
isn't quite as powerful as it sounds. If your data is sampled exactly every 8s and you want it resampled to 10s, it's probably easiest to (1) upsample (withresample
) to 2s (highest common denominator), (2)interpolate
, then (3)resample
down to 10s. This will avoid NaNs being produced where the timestamps don't align exactly with 10s intervals (as per Scott Boston's answer below). – Paul Wilson Commented Mar 30 at 6:58