最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Pandas dataframe add a mark columns by two other columns condition - Stack Overflow

programmeradmin4浏览0评论

There is a dataframe like this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'x':np.arange(1,29),'y':[5.69, 6.03, 6.03, 6.03, 6.03, 6.03, 6.03, 5.38, 5.21, 5.4 , 5.24,
       5.4 , 5.36, 5.47, 5.58, 5.5 , 5.61, 5.53, 5.4 , 5.51, 5.47, 5.44,5.39, 5.27, 5.38, 5.35, 5.32, 5.09],
          'valley':[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
          'peak':[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0]})

>>> df
     x     y  valley  peak
0    1  5.69       1     0
1    2  6.03       0     0
2    3  6.03       0     1
3    4  6.03       0     0
4    5  6.03       0     0
5    6  6.03       0     0
6    7  6.03       0     0
7    8  5.38       0     0
8    9  5.21       1     0
9   10  5.40       0     0
10  11  5.24       0     0
11  12  5.40       0     0
12  13  5.36       0     0
13  14  5.47       0     0
14  15  5.58       0     0
15  16  5.50       0     0
16  17  5.61       0     1
17  18  5.53       0     0
18  19  5.40       0     0
19  20  5.51       0     0
20  21  5.47       0     0
21  22  5.44       0     0
22  23  5.39       0     0
23  24  5.27       0     0
24  25  5.38       0     0
25  26  5.35       0     0
26  27  5.32       0     0
27  28  5.09       1     0

I hope to add a new column 'grp' to this dataframe, with the requirement that for each row that starts with "1" in the valley column and ends with "1" in the peak column, the value in the added column is "A", and conversely, for each row that starts with "1" in the peak column and ends with "1" in the valley column, the value in the added column is 'B'.

The desire result is:

>>> out
     x     y  valley  peak  grp
0    1  5.69       1     0  A
1    2  6.03       0     0  A
2    3  6.03       0     1  B
3    4  6.03       0     0  B
4    5  6.03       0     0  B
5    6  6.03       0     0  B
6    7  6.03       0     0  B
7    8  5.38       0     0  B
8    9  5.21       1     0  A
9   10  5.40       0     0  A
10  11  5.24       0     0  A
11  12  5.40       0     0  A
12  13  5.36       0     0  A
13  14  5.47       0     0  A
14  15  5.58       0     0  A
15  16  5.50       0     0  A
16  17  5.61       0     1  B
17  18  5.53       0     0  B
18  19  5.40       0     0  B
19  20  5.51       0     0  B
20  21  5.47       0     0  B
21  22  5.44       0     0  B
22  23  5.39       0     0  B
23  24  5.27       0     0  B
24  25  5.38       0     0  B
25  26  5.35       0     0  B
26  27  5.32       0     0  B
27  28  5.09       1     0  A

If we don't use apply with a function and for-loops, is there a native way to achieve by use pandas?

There is a dataframe like this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'x':np.arange(1,29),'y':[5.69, 6.03, 6.03, 6.03, 6.03, 6.03, 6.03, 5.38, 5.21, 5.4 , 5.24,
       5.4 , 5.36, 5.47, 5.58, 5.5 , 5.61, 5.53, 5.4 , 5.51, 5.47, 5.44,5.39, 5.27, 5.38, 5.35, 5.32, 5.09],
          'valley':[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
          'peak':[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0]})

>>> df
     x     y  valley  peak
0    1  5.69       1     0
1    2  6.03       0     0
2    3  6.03       0     1
3    4  6.03       0     0
4    5  6.03       0     0
5    6  6.03       0     0
6    7  6.03       0     0
7    8  5.38       0     0
8    9  5.21       1     0
9   10  5.40       0     0
10  11  5.24       0     0
11  12  5.40       0     0
12  13  5.36       0     0
13  14  5.47       0     0
14  15  5.58       0     0
15  16  5.50       0     0
16  17  5.61       0     1
17  18  5.53       0     0
18  19  5.40       0     0
19  20  5.51       0     0
20  21  5.47       0     0
21  22  5.44       0     0
22  23  5.39       0     0
23  24  5.27       0     0
24  25  5.38       0     0
25  26  5.35       0     0
26  27  5.32       0     0
27  28  5.09       1     0

I hope to add a new column 'grp' to this dataframe, with the requirement that for each row that starts with "1" in the valley column and ends with "1" in the peak column, the value in the added column is "A", and conversely, for each row that starts with "1" in the peak column and ends with "1" in the valley column, the value in the added column is 'B'.

The desire result is:

>>> out
     x     y  valley  peak  grp
0    1  5.69       1     0  A
1    2  6.03       0     0  A
2    3  6.03       0     1  B
3    4  6.03       0     0  B
4    5  6.03       0     0  B
5    6  6.03       0     0  B
6    7  6.03       0     0  B
7    8  5.38       0     0  B
8    9  5.21       1     0  A
9   10  5.40       0     0  A
10  11  5.24       0     0  A
11  12  5.40       0     0  A
12  13  5.36       0     0  A
13  14  5.47       0     0  A
14  15  5.58       0     0  A
15  16  5.50       0     0  A
16  17  5.61       0     1  B
17  18  5.53       0     0  B
18  19  5.40       0     0  B
19  20  5.51       0     0  B
20  21  5.47       0     0  B
21  22  5.44       0     0  B
22  23  5.39       0     0  B
23  24  5.27       0     0  B
24  25  5.38       0     0  B
25  26  5.35       0     0  B
26  27  5.32       0     0  B
27  28  5.09       1     0  A

If we don't use apply with a function and for-loops, is there a native way to achieve by use pandas?

Share Improve this question asked Jan 18 at 14:57 Sun JarSun Jar 3411 silver badge12 bronze badges 2
  • You haven't shown/described what should happen if you have two successive 1s in peak or valley. – mozway Commented Jan 18 at 15:10
  • @mozway These data are generated using the 'findpeak' library, which is used to analyze peaks and valleys in continuous data. The data will not have consecutive 1s, and the peak column and valley column will not be 1 on the same row. – Sun Jar Commented Jan 19 at 0:12
Add a comment  | 

1 Answer 1

Reset to default 1

Given your description, you could use np.select and ffill. This way you'll ensure that even if you have multiple peaks before a valley or conversely this will keep the order:

m1 = df['valley'].eq(1)
m2 = df['peak'].eq(1)
df['grp'] = pd.Series(np.select([m1, m2], ['A', 'B'], pd.NA),
                      index=df.index).ffill()

Variant with case_when:

m1 = df['valley'].eq(1)
m2 = df['peak'].eq(1)
df['grp'] = df['valley'].case_when([(m1, 'A'), (m2, 'B'), (~(m1|m2), pd.NA)]
                                   ).ffill()

Or with from_dummies after adding a new column:

df['grp'] = (pd.from_dummies(df[['valley', 'peak']]
                             .assign(other=df[['valley', 'peak']]
                                     .sum(axis=1).rsub(1)))
               .squeeze().map({'valley': 'A', 'peak': 'B'}).ffill()
            )

Or with reshaping:

df['grp'] = (df[['valley', 'peak']]
 .rename_axis(columns='col')
 .replace(0, pd.NA).stack().reset_index(-1, name='val')
 .replace({'col': {'valley': 'A', 'peak': 'B'}})['col']
 .reindex(df.index).ffill()
)

Alternatively, if there is always a valley then peak then valley... you could use cumsum+mod and map the group after identify which one of peak/valley is the first:

df['grp'] = (df[['valley', 'peak']].max(axis=1).cumsum()
            .add(df[['valley', 'peak']].idxmax().idxmin() == 'peak')
            .mod(2).map({0: 'B', 1: 'A'})
           )

Output:

     x     y  valley  peak grp
0    1  5.69       1     0   A
1    2  6.03       0     0   A
2    3  6.03       0     1   B
3    4  6.03       0     0   B
4    5  6.03       0     0   B
5    6  6.03       0     0   B
6    7  6.03       0     0   B
7    8  5.38       0     0   B
8    9  5.21       1     0   A
9   10  5.40       0     0   A
10  11  5.24       0     0   A
11  12  5.40       0     0   A
12  13  5.36       0     0   A
13  14  5.47       0     0   A
14  15  5.58       0     0   A
15  16  5.50       0     0   A
16  17  5.61       0     1   B
17  18  5.53       0     0   B
18  19  5.40       0     0   B
19  20  5.51       0     0   B
20  21  5.47       0     0   B
21  22  5.44       0     0   B
22  23  5.39       0     0   B
23  24  5.27       0     0   B
24  25  5.38       0     0   B
25  26  5.35       0     0   B
26  27  5.32       0     0   B
27  28  5.09       1     0   A
发布评论

评论列表(0)

  1. 暂无评论