Pandas dataframe add a mark columns by two other columns condition

There is a dataframe like this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'x':np.arange(1,29),'y':[5.69, 6.03, 6.03, 6.03, 6.03, 6.03, 6.03, 5.38, 5.21, 5.4 , 5.24,
       5.4 , 5.36, 5.47, 5.58, 5.5 , 5.61, 5.53, 5.4 , 5.51, 5.47, 5.44,5.39, 5.27, 5.38, 5.35, 5.32, 5.09],
          'valley':[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
          'peak':[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0]})

>>> df
     x     y  valley  peak
0    1  5.69       1     0
1    2  6.03       0     0
2    3  6.03       0     1
3    4  6.03       0     0
4    5  6.03       0     0
5    6  6.03       0     0
6    7  6.03       0     0
7    8  5.38       0     0
8    9  5.21       1     0
9   10  5.40       0     0
10  11  5.24       0     0
11  12  5.40       0     0
12  13  5.36       0     0
13  14  5.47       0     0
14  15  5.58       0     0
15  16  5.50       0     0
16  17  5.61       0     1
17  18  5.53       0     0
18  19  5.40       0     0
19  20  5.51       0     0
20  21  5.47       0     0
21  22  5.44       0     0
22  23  5.39       0     0
23  24  5.27       0     0
24  25  5.38       0     0
25  26  5.35       0     0
26  27  5.32       0     0
27  28  5.09       1     0

I hope to add a new column 'grp' to this dataframe, with the requirement that for each row that starts with "1" in the valley column and ends with "1" in the peak column, the value in the added column is "A", and conversely, for each row that starts with "1" in the peak column and ends with "1" in the valley column, the value in the added column is 'B'.

The desire result is:

>>> out
     x     y  valley  peak  grp
0    1  5.69       1     0  A
1    2  6.03       0     0  A
2    3  6.03       0     1  B
3    4  6.03       0     0  B
4    5  6.03       0     0  B
5    6  6.03       0     0  B
6    7  6.03       0     0  B
7    8  5.38       0     0  B
8    9  5.21       1     0  A
9   10  5.40       0     0  A
10  11  5.24       0     0  A
11  12  5.40       0     0  A
12  13  5.36       0     0  A
13  14  5.47       0     0  A
14  15  5.58       0     0  A
15  16  5.50       0     0  A
16  17  5.61       0     1  B
17  18  5.53       0     0  B
18  19  5.40       0     0  B
19  20  5.51       0     0  B
20  21  5.47       0     0  B
21  22  5.44       0     0  B
22  23  5.39       0     0  B
23  24  5.27       0     0  B
24  25  5.38       0     0  B
25  26  5.35       0     0  B
26  27  5.32       0     0  B
27  28  5.09       1     0  A

If we don't use apply with a function and for-loops, is there a native way to achieve by use pandas?

There is a dataframe like this:

import numpy as np
import pandas as pd

df = pd.DataFrame({'x':np.arange(1,29),'y':[5.69, 6.03, 6.03, 6.03, 6.03, 6.03, 6.03, 5.38, 5.21, 5.4 , 5.24,
       5.4 , 5.36, 5.47, 5.58, 5.5 , 5.61, 5.53, 5.4 , 5.51, 5.47, 5.44,5.39, 5.27, 5.38, 5.35, 5.32, 5.09],
          'valley':[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
          'peak':[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0]})

>>> df
     x     y  valley  peak
0    1  5.69       1     0
1    2  6.03       0     0
2    3  6.03       0     1
3    4  6.03       0     0
4    5  6.03       0     0
5    6  6.03       0     0
6    7  6.03       0     0
7    8  5.38       0     0
8    9  5.21       1     0
9   10  5.40       0     0
10  11  5.24       0     0
11  12  5.40       0     0
12  13  5.36       0     0
13  14  5.47       0     0
14  15  5.58       0     0
15  16  5.50       0     0
16  17  5.61       0     1
17  18  5.53       0     0
18  19  5.40       0     0
19  20  5.51       0     0
20  21  5.47       0     0
21  22  5.44       0     0
22  23  5.39       0     0
23  24  5.27       0     0
24  25  5.38       0     0
25  26  5.35       0     0
26  27  5.32       0     0
27  28  5.09       1     0

The desire result is:

>>> out
     x     y  valley  peak  grp
0    1  5.69       1     0  A
1    2  6.03       0     0  A
2    3  6.03       0     1  B
3    4  6.03       0     0  B
4    5  6.03       0     0  B
5    6  6.03       0     0  B
6    7  6.03       0     0  B
7    8  5.38       0     0  B
8    9  5.21       1     0  A
9   10  5.40       0     0  A
10  11  5.24       0     0  A
11  12  5.40       0     0  A
12  13  5.36       0     0  A
13  14  5.47       0     0  A
14  15  5.58       0     0  A
15  16  5.50       0     0  A
16  17  5.61       0     1  B
17  18  5.53       0     0  B
18  19  5.40       0     0  B
19  20  5.51       0     0  B
20  21  5.47       0     0  B
21  22  5.44       0     0  B
22  23  5.39       0     0  B
23  24  5.27       0     0  B
24  25  5.38       0     0  B
25  26  5.35       0     0  B
26  27  5.32       0     0  B
27  28  5.09       1     0  A

If we don't use apply with a function and for-loops, is there a native way to achieve by use pandas?

Share Improve this question asked Jan 18 at 14:57 Sun Jar 3411 silver badge12 bronze badges

You haven't shown/described what should happen if you have two successive 1s in peak or valley. – mozway Commented Jan 18 at 15:10
@mozway These data are generated using the 'findpeak' library, which is used to analyze peaks and valleys in continuous data. The data will not have consecutive 1s, and the peak column and valley column will not be 1 on the same row. – Sun Jar Commented Jan 19 at 0:12

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Given your description, you could use np.select and ffill. This way you'll ensure that even if you have multiple peaks before a valley or conversely this will keep the order:

m1 = df['valley'].eq(1)
m2 = df['peak'].eq(1)
df['grp'] = pd.Series(np.select([m1, m2], ['A', 'B'], pd.NA),
                      index=df.index).ffill()

Variant with case_when:

m1 = df['valley'].eq(1)
m2 = df['peak'].eq(1)
df['grp'] = df['valley'].case_when([(m1, 'A'), (m2, 'B'), (~(m1|m2), pd.NA)]
                                   ).ffill()

Or with from_dummies after adding a new column:

df['grp'] = (pd.from_dummies(df[['valley', 'peak']]
                             .assign(other=df[['valley', 'peak']]
                                     .sum(axis=1).rsub(1)))
               .squeeze().map({'valley': 'A', 'peak': 'B'}).ffill()
            )

Or with reshaping:

df['grp'] = (df[['valley', 'peak']]
 .rename_axis(columns='col')
 .replace(0, pd.NA).stack().reset_index(-1, name='val')
 .replace({'col': {'valley': 'A', 'peak': 'B'}})['col']
 .reindex(df.index).ffill()
)

Alternatively, if there is always a valley then peak then valley... you could use cumsum+mod and map the group after identify which one of peak/valley is the first:

df['grp'] = (df[['valley', 'peak']].max(axis=1).cumsum()
            .add(df[['valley', 'peak']].idxmax().idxmin() == 'peak')
            .mod(2).map({0: 'B', 1: 'A'})
           )

Output:

     x     y  valley  peak grp
0    1  5.69       1     0   A
1    2  6.03       0     0   A
2    3  6.03       0     1   B
3    4  6.03       0     0   B
4    5  6.03       0     0   B
5    6  6.03       0     0   B
6    7  6.03       0     0   B
7    8  5.38       0     0   B
8    9  5.21       1     0   A
9   10  5.40       0     0   A
10  11  5.24       0     0   A
11  12  5.40       0     0   A
12  13  5.36       0     0   A
13  14  5.47       0     0   A
14  15  5.58       0     0   A
15  16  5.50       0     0   A
16  17  5.61       0     1   B
17  18  5.53       0     0   B
18  19  5.40       0     0   B
19  20  5.51       0     0   B
20  21  5.47       0     0   B
21  22  5.44       0     0   B
22  23  5.39       0     0   B
23  24  5.27       0     0   B
24  25  5.38       0     0   B
25  26  5.35       0     0   B
26  27  5.32       0     0   B
27  28  5.09       1     0   A

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Pandas dataframe add a mark columns by two other columns condition - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)