最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Pandas Dataframe ffill with One Greater the Previous Nonzero Value - Stack Overflow

programmeradmin0浏览0评论

I have a pandas DataFrame with a column like:

0
1
1
2
2
3
4
5
5
0
0
0

I would like to leave any leading zeros, but ffill to replace the trailing zeros with one greater than the previous, nonzero value. In this case, I'd like the output to be:

0
1
1
2
2
3
4
5
5
6
6
6

How can I go about doing this?

I have a pandas DataFrame with a column like:

0
1
1
2
2
3
4
5
5
0
0
0

I would like to leave any leading zeros, but ffill to replace the trailing zeros with one greater than the previous, nonzero value. In this case, I'd like the output to be:

0
1
1
2
2
3
4
5
5
6
6
6

How can I go about doing this?

Share Improve this question asked Nov 19, 2024 at 15:55 Aaron HorowitzAaron Horowitz 637 bronze badges 2
  • Do you have any other edge cases? Or is the column always zero from the last non zero value? – Tom McLean Commented Nov 19, 2024 at 16:02
  • So far this is my edge case. I've been able to handle any others without much issue. Mozway's answer works for my needs for now, but I'll update if I run into any edge cases that it can't handle. – Aaron Horowitz Commented Nov 19, 2024 at 16:09
Add a comment  | 

1 Answer 1

Reset to default 2

You could mask, increment and ffill:

m = df['col'].eq(0)
s = df['col'].mask(m)
df['out'] = s.fillna(s.add(1).ffill().fillna(0)).convert_dtypes()

Or, if you really want to only target the trailing zeros:

df['out'] = df['col'].mask(df['col'].eq(0)[::-1].cummin(), df['col'].max()+1)

Output:

    col  out
0     0    0
1     1    1
2     1    1
3     2    2
4     2    2
5     3    3
6     4    4
7     5    5
8     5    5
9     0    6
10    0    6
11    0    6

Intermediates (first approach):

    col  out      m    s  s.add(1)  .ffill()  .fillna(0)
0     0    0   True  NaN       NaN       NaN         0.0
1     1    1  False  1.0       2.0       2.0         2.0
2     1    1  False  1.0       2.0       2.0         2.0
3     2    2  False  2.0       3.0       3.0         3.0
4     2    2  False  2.0       3.0       3.0         3.0
5     3    3  False  3.0       4.0       4.0         4.0
6     4    4  False  4.0       5.0       5.0         5.0
7     5    5  False  5.0       6.0       6.0         6.0
8     5    5  False  5.0       6.0       6.0         6.0
9     0    6   True  NaN       NaN       6.0         6.0
10    0    6   True  NaN       NaN       6.0         6.0
11    0    6   True  NaN       NaN       6.0         6.0

Intermediates (second approach):

    col  out      m    s  df['col'].eq(0)  [::-1].cummin()
0     0    0   True  NaN             True            False
1     1    1  False  1.0            False            False
2     1    1  False  1.0            False            False
3     2    2  False  2.0            False            False
4     2    2  False  2.0            False            False
5     3    3  False  3.0            False            False
6     4    4  False  4.0            False            False
7     5    5  False  5.0            False            False
8     5    5  False  5.0            False            False
9     0    6   True  NaN             True             True
10    0    6   True  NaN             True             True
11    0    6   True  NaN             True             True

applying per group:

Assuming a group LOT_ID and the target column STEP_NUMBER:

df['out'] = (df.groupby('LOT_ID')['STEP_NUMBER']
             .transform(lambda x: x.mask(x.eq(0)[::-1].cummin(), x.max()+1))
            )
发布评论

评论列表(0)

  1. 暂无评论