最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Excessive NaNs when resampling + interpolating in Pandas - Stack Overflow

programmeradmin4浏览0评论

I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.

When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:

import numpy as np
import pandas as pd
import datetime

a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']

2025-12-02 17:39:06     60.0
2025-12-02 17:39:14     68.0
2025-12-02 17:39:22     76.0
2025-12-02 17:39:30     84.0
2025-12-02 17:39:38     92.0
2025-12-02 17:39:46    100.0
2025-12-02 17:39:54    108.0
2025-12-02 17:40:02    116.0
2025-12-02 17:40:10    124.0
Freq: 8s, Name: Hi, dtype: float64

When using resample interpolate, this is the result:

interval8df.resample('10s').interpolate(method='time')['Hi']

2025-12-02 17:39:00      NaN
2025-12-02 17:39:10      NaN
2025-12-02 17:39:20      NaN
2025-12-02 17:39:30     84.0
2025-12-02 17:39:40     94.0
2025-12-02 17:39:50    104.0
2025-12-02 17:40:00    114.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?

I've tried using mean, but that produced no NaNs.

interval8df.resample('10s').mean()['Hi']

2025-12-02 17:39:00     60.0
2025-12-02 17:39:10     68.0
2025-12-02 17:39:20     76.0
2025-12-02 17:39:30     88.0
2025-12-02 17:39:40    100.0
2025-12-02 17:39:50    108.0
2025-12-02 17:40:00    116.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

Additionally, changing the interpolate method does not seem to have improved the solution.

The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.

I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.

When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:

import numpy as np
import pandas as pd
import datetime

a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']

2025-12-02 17:39:06     60.0
2025-12-02 17:39:14     68.0
2025-12-02 17:39:22     76.0
2025-12-02 17:39:30     84.0
2025-12-02 17:39:38     92.0
2025-12-02 17:39:46    100.0
2025-12-02 17:39:54    108.0
2025-12-02 17:40:02    116.0
2025-12-02 17:40:10    124.0
Freq: 8s, Name: Hi, dtype: float64

When using resample interpolate, this is the result:

interval8df.resample('10s').interpolate(method='time')['Hi']

2025-12-02 17:39:00      NaN
2025-12-02 17:39:10      NaN
2025-12-02 17:39:20      NaN
2025-12-02 17:39:30     84.0
2025-12-02 17:39:40     94.0
2025-12-02 17:39:50    104.0
2025-12-02 17:40:00    114.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?

I've tried using mean, but that produced no NaNs.

interval8df.resample('10s').mean()['Hi']

2025-12-02 17:39:00     60.0
2025-12-02 17:39:10     68.0
2025-12-02 17:39:20     76.0
2025-12-02 17:39:30     88.0
2025-12-02 17:39:40    100.0
2025-12-02 17:39:50    108.0
2025-12-02 17:40:00    116.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

Additionally, changing the interpolate method does not seem to have improved the solution.

The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.

Share Improve this question edited Mar 30 at 15:56 halfer 20.4k19 gold badges109 silver badges202 bronze badges asked Mar 29 at 21:59 user30106177user30106177 132 bronze badges 2
  • A suitable kludge may be to back project a linear extrapolation of the first few samples so that pandas doesn't nan the first few datapoints of the actual series. Whether this is an acceptable solution depends on what your final use is. It is usually safer to use your raw data as it was sampled to infer whatever it is that you want to compute. Interpolation always introduces artefacts that might or might not matter. – Martin Brown Commented Mar 29 at 22:11
  • Unfortunately resample isn't quite as powerful as it sounds. If your data is sampled exactly every 8s and you want it resampled to 10s, it's probably easiest to (1) upsample (with resample) to 2s (highest common denominator), (2) interpolate, then (3) resample down to 10s. This will avoid NaNs being produced where the timestamps don't align exactly with 10s intervals (as per Scott Boston's answer below). – Paul Wilson Commented Mar 30 at 6:58
Add a comment  | 

1 Answer 1

Reset to default 0

To see what is happening, let's add asfreq after the resample and you can see what is passed in to the next chained function:

interval8df.resample('10s').asfreq()

Output:

Hi
2025-12-02 17:39:00    NaN
2025-12-02 17:39:10    NaN
2025-12-02 17:39:20    NaN
2025-12-02 17:39:30   84.0
2025-12-02 17:39:40    NaN
2025-12-02 17:39:50    NaN
2025-12-02 17:40:00    NaN
2025-12-02 17:40:10  124.0

And, since you doing interpolation, the lower bound is not seen hence the nulls for seconds 00, 10, 20. While doing mean with out interpolating you, are just doing a window of 10s means of values. Since you have values within each 10s interval you are getting that mean values returned.

发布评论

评论列表(0)

  1. 暂无评论