最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - In numpy find a percentile in 2d with some condition - Stack Overflow

programmeradmin0浏览0评论

I have this kind of array

a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])

I want to get 40% percentile of each row in that array where it is not equal -999

If I use np.percentile(a, 40, axis=1) I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8]) which is still include -999

the output I want will be like this

[
   6.2, # 3 or 7 also ok
   1,
   4.2, # 4 or 5 also ok
   4.8, # 0 or 6 also ok
   1,
   4
]

Thank you

I have this kind of array

a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])

I want to get 40% percentile of each row in that array where it is not equal -999

If I use np.percentile(a, 40, axis=1) I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8]) which is still include -999

the output I want will be like this

[
   6.2, # 3 or 7 also ok
   1,
   4.2, # 4 or 5 also ok
   4.8, # 0 or 6 also ok
   1,
   4
]

Thank you

Share Improve this question asked Mar 3 at 2:06 d_frEakd_frEak 4824 silver badges14 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 3

You can replace the -999s with NaNs and use nanpercentile.

import numpy as np
a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]], dtype=np.float64)
a[a == -999] = np.nan
np.nanpercentile(a, 40, axis=-1, keepdims=True)
# array([[6.2],
#        [1. ],
#        [4.2],
#        [4.8],
#        [3. ],
#        [4.8]])
# Use the `method` argument if you want a different type of estimate
# `keepdims=True` keeps the result a column, which it looks like you want 

You asked for a solution "in NumPy", and that's it. (Unless you want to re-implement percentile, which is not so hard. Or I suppose you could use apply_along_axis on a function that removes the -999s before taking the quantile, but that will just loop in Python over the slices, which can be slow.)


If you don't want to have to change the dtype and replace with NaNs to perform the operation, you can use NumPy masked arrays with scipy.stats.mquantiles.

import numpy as np
from scipy import stats
a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])
mask = a == -999
b = np.ma.masked_array(a, mask=mask)
stats.mstats.mquantiles(b, 0.4, alphap=1, betap=1, axis=-1)
# alphap=1, betap=1 are the settings to reproduce the same values produced by NumPy's default `method`.

But beware that mquantiles is on its way out, superseded by new features in the next release.

发布评论

评论列表(0)

  1. 暂无评论