python - In numpy find a percentile in 2d with some condition

I have this kind of array

a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])

I want to get 40% percentile of each row in that array where it is not equal -999

If I use np.percentile(a, 40, axis=1) I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8]) which is still include -999

the output I want will be like this

[
   6.2, # 3 or 7 also ok
   1,
   4.2, # 4 or 5 also ok
   4.8, # 0 or 6 also ok
   1,
   4
]

Thank you

I have this kind of array

a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])

I want to get 40% percentile of each row in that array where it is not equal -999

If I use np.percentile(a, 40, axis=1) I will get array([ 3.8, 1. , 4.2, 1.2, -799. , 4.8]) which is still include -999

the output I want will be like this

[
   6.2, # 3 or 7 also ok
   1,
   4.2, # 4 or 5 also ok
   4.8, # 0 or 6 also ok
   1,
   4
]

Thank you

Share Improve this question asked Mar 3 at 2:06 d_frEak 4824 silver badges14 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 3

You can replace the -999s with NaNs and use nanpercentile.

import numpy as np
a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]], dtype=np.float64)
a[a == -999] = np.nan
np.nanpercentile(a, 40, axis=-1, keepdims=True)
# array([[6.2],
#        [1. ],
#        [4.2],
#        [4.8],
#        [3. ],
#        [4.8]])
# Use the `method` argument if you want a different type of estimate
# `keepdims=True` keeps the result a column, which it looks like you want

You asked for a solution "in NumPy", and that's it. (Unless you want to re-implement percentile, which is not so hard. Or I suppose you could use apply_along_axis on a function that removes the -999s before taking the quantile, but that will just loop in Python over the slices, which can be slow.)

If you don't want to have to change the dtype and replace with NaNs to perform the operation, you can use NumPy masked arrays with scipy.stats.mquantiles.

import numpy as np
from scipy import stats
a = np.array([[-999, 9, 7, 3],
   [2, 1, -999, 1],
   [1, 5, 4, 6],
   [0, 6, -999, 9],
   [1, -999, -999, 6],
   [8, 4, 4, 8]])
mask = a == -999
b = np.ma.masked_array(a, mask=mask)
stats.mstats.mquantiles(b, 0.4, alphap=1, betap=1, axis=-1)
# alphap=1, betap=1 are the settings to reproduce the same values produced by NumPy's default `method`.

But beware that mquantiles is on its way out, superseded by new features in the next release.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - In numpy find a percentile in 2d with some condition - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)