python - Why std(skipna=False) and std(skipna=True) yield different results even when there are no NaN or null values in the Ser

I have a pandas Series s, and when I call s.std(skipna=True) and s.std(skipna=False) I get different results even when there are no NaN/null values in s, why? Did I misunderstand the skipna parameter? I'm using pandas 1.3.4

import pandas as pd

s = pd.Series([10.0]*4800000, index=range(4800000), dtype="float32")

# No NaN/null in the Series
print(s.isnull().any()) # False
print(s.isna().any()) # False

# Why the code below prints different results?
print(s.std(skipna=False)) # 0.0
print(s.std(skipna=True)) # 0.61053276

import pandas as pd

s = pd.Series([10.0]*4800000, index=range(4800000), dtype="float32")

# No NaN/null in the Series
print(s.isnull().any()) # False
print(s.isna().any()) # False

# Why the code below prints different results?
print(s.std(skipna=False)) # 0.0
print(s.std(skipna=True)) # 0.61053276

Share Improve this question edited Mar 19 at 2:50 asked Mar 19 at 2:39 konchy 8938 silver badges19 bronze badges

Please remember to show a real minimal reproducible example, which means also showing a minimal declaration for s so that running the code shows off the issue. Right now we just have to trust that your initial claim is true, without any evidence for it. – Mike 'Pomax' Kamermans Commented Mar 19 at 2:45
@Mike'Pomax'Kamermans Thanks, I've added the example to reproduce it. – konchy Commented Mar 19 at 2:52
For me, the example produces False, False, 0.0, 0.0 as expected. – Raymond Hettinger Commented Mar 19 at 2:54
2 I vaguely remember some bugs associated with certain versions of the Bottleneck optional dependency that could cause problems with NaN-related routines. Updating pandas and Bottleneck seems like the first thing to try. – user2357112 Commented Mar 19 at 2:58
@user2357112 Thanks for the clue, disable Bottleneck with pd.set_option("use_bottleneck", False) fixes the problem for me. You can post it as an answer and I will accept it – konchy Commented Mar 19 at 3:05

Add a comment |

1 Answer 1

Sorted by: Reset to default 3

This is an issue with the Bottleneck optional dependency, used to accelerate some NaN-related routines. I think the wrong result happens due to loss of precision while calculating the mean, since Bottleneck uses naive summation, while NumPy uses more accurate pairwise summation.

You can disable Bottleneck with

pd.set_option('compute.use_bottleneck', False)

to fall back to the NumPy handling.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Why std(skipna=False) and std(skipna=True) yield different results even when there are no NaN or null values in the Ser

1 Answer 1

与本文相关的文章

评论列表(0)