最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Why std(skipna=False) and std(skipna=True) yield different results even when there are no NaN or null values in the Ser

programmeradmin2浏览0评论

I have a pandas Series s, and when I call s.std(skipna=True) and s.std(skipna=False) I get different results even when there are no NaN/null values in s, why? Did I misunderstand the skipna parameter? I'm using pandas 1.3.4

import pandas as pd

s = pd.Series([10.0]*4800000, index=range(4800000), dtype="float32")

# No NaN/null in the Series
print(s.isnull().any()) # False
print(s.isna().any()) # False

# Why the code below prints different results?
print(s.std(skipna=False)) # 0.0
print(s.std(skipna=True)) # 0.61053276

I have a pandas Series s, and when I call s.std(skipna=True) and s.std(skipna=False) I get different results even when there are no NaN/null values in s, why? Did I misunderstand the skipna parameter? I'm using pandas 1.3.4

import pandas as pd

s = pd.Series([10.0]*4800000, index=range(4800000), dtype="float32")

# No NaN/null in the Series
print(s.isnull().any()) # False
print(s.isna().any()) # False

# Why the code below prints different results?
print(s.std(skipna=False)) # 0.0
print(s.std(skipna=True)) # 0.61053276
Share Improve this question edited Mar 19 at 2:50 konchy asked Mar 19 at 2:39 konchykonchy 8938 silver badges19 bronze badges 5
  • Please remember to show a real minimal reproducible example, which means also showing a minimal declaration for s so that running the code shows off the issue. Right now we just have to trust that your initial claim is true, without any evidence for it. – Mike 'Pomax' Kamermans Commented Mar 19 at 2:45
  • @Mike'Pomax'Kamermans Thanks, I've added the example to reproduce it. – konchy Commented Mar 19 at 2:52
  • For me, the example produces False, False, 0.0, 0.0 as expected. – Raymond Hettinger Commented Mar 19 at 2:54
  • 2 I vaguely remember some bugs associated with certain versions of the Bottleneck optional dependency that could cause problems with NaN-related routines. Updating pandas and Bottleneck seems like the first thing to try. – user2357112 Commented Mar 19 at 2:58
  • @user2357112 Thanks for the clue, disable Bottleneck with pd.set_option("use_bottleneck", False) fixes the problem for me. You can post it as an answer and I will accept it – konchy Commented Mar 19 at 3:05
Add a comment  | 

1 Answer 1

Reset to default 3

This is an issue with the Bottleneck optional dependency, used to accelerate some NaN-related routines. I think the wrong result happens due to loss of precision while calculating the mean, since Bottleneck uses naive summation, while NumPy uses more accurate pairwise summation.

You can disable Bottleneck with

pd.set_option('compute.use_bottleneck', False)

to fall back to the NumPy handling.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论