There is a dataframe like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'x':np.arange(1,29),'y':[5.69, 6.03, 6.03, 6.03, 6.03, 6.03, 6.03, 5.38, 5.21, 5.4 , 5.24,
5.4 , 5.36, 5.47, 5.58, 5.5 , 5.61, 5.53, 5.4 , 5.51, 5.47, 5.44,5.39, 5.27, 5.38, 5.35, 5.32, 5.09],
'valley':[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'peak':[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0]})
>>> df
x y valley peak
0 1 5.69 1 0
1 2 6.03 0 0
2 3 6.03 0 1
3 4 6.03 0 0
4 5 6.03 0 0
5 6 6.03 0 0
6 7 6.03 0 0
7 8 5.38 0 0
8 9 5.21 1 0
9 10 5.40 0 0
10 11 5.24 0 0
11 12 5.40 0 0
12 13 5.36 0 0
13 14 5.47 0 0
14 15 5.58 0 0
15 16 5.50 0 0
16 17 5.61 0 1
17 18 5.53 0 0
18 19 5.40 0 0
19 20 5.51 0 0
20 21 5.47 0 0
21 22 5.44 0 0
22 23 5.39 0 0
23 24 5.27 0 0
24 25 5.38 0 0
25 26 5.35 0 0
26 27 5.32 0 0
27 28 5.09 1 0
I hope to add a new column 'grp' to this dataframe, with the requirement that for each row that starts with "1" in the valley column and ends with "1" in the peak column, the value in the added column is "A", and conversely, for each row that starts with "1" in the peak column and ends with "1" in the valley column, the value in the added column is 'B'.
The desire result is:
>>> out
x y valley peak grp
0 1 5.69 1 0 A
1 2 6.03 0 0 A
2 3 6.03 0 1 B
3 4 6.03 0 0 B
4 5 6.03 0 0 B
5 6 6.03 0 0 B
6 7 6.03 0 0 B
7 8 5.38 0 0 B
8 9 5.21 1 0 A
9 10 5.40 0 0 A
10 11 5.24 0 0 A
11 12 5.40 0 0 A
12 13 5.36 0 0 A
13 14 5.47 0 0 A
14 15 5.58 0 0 A
15 16 5.50 0 0 A
16 17 5.61 0 1 B
17 18 5.53 0 0 B
18 19 5.40 0 0 B
19 20 5.51 0 0 B
20 21 5.47 0 0 B
21 22 5.44 0 0 B
22 23 5.39 0 0 B
23 24 5.27 0 0 B
24 25 5.38 0 0 B
25 26 5.35 0 0 B
26 27 5.32 0 0 B
27 28 5.09 1 0 A
If we don't use apply with a function and for-loops, is there a native way to achieve by use pandas?
There is a dataframe like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'x':np.arange(1,29),'y':[5.69, 6.03, 6.03, 6.03, 6.03, 6.03, 6.03, 5.38, 5.21, 5.4 , 5.24,
5.4 , 5.36, 5.47, 5.58, 5.5 , 5.61, 5.53, 5.4 , 5.51, 5.47, 5.44,5.39, 5.27, 5.38, 5.35, 5.32, 5.09],
'valley':[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'peak':[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,0, 0, 0, 0, 0, 0]})
>>> df
x y valley peak
0 1 5.69 1 0
1 2 6.03 0 0
2 3 6.03 0 1
3 4 6.03 0 0
4 5 6.03 0 0
5 6 6.03 0 0
6 7 6.03 0 0
7 8 5.38 0 0
8 9 5.21 1 0
9 10 5.40 0 0
10 11 5.24 0 0
11 12 5.40 0 0
12 13 5.36 0 0
13 14 5.47 0 0
14 15 5.58 0 0
15 16 5.50 0 0
16 17 5.61 0 1
17 18 5.53 0 0
18 19 5.40 0 0
19 20 5.51 0 0
20 21 5.47 0 0
21 22 5.44 0 0
22 23 5.39 0 0
23 24 5.27 0 0
24 25 5.38 0 0
25 26 5.35 0 0
26 27 5.32 0 0
27 28 5.09 1 0
I hope to add a new column 'grp' to this dataframe, with the requirement that for each row that starts with "1" in the valley column and ends with "1" in the peak column, the value in the added column is "A", and conversely, for each row that starts with "1" in the peak column and ends with "1" in the valley column, the value in the added column is 'B'.
The desire result is:
>>> out
x y valley peak grp
0 1 5.69 1 0 A
1 2 6.03 0 0 A
2 3 6.03 0 1 B
3 4 6.03 0 0 B
4 5 6.03 0 0 B
5 6 6.03 0 0 B
6 7 6.03 0 0 B
7 8 5.38 0 0 B
8 9 5.21 1 0 A
9 10 5.40 0 0 A
10 11 5.24 0 0 A
11 12 5.40 0 0 A
12 13 5.36 0 0 A
13 14 5.47 0 0 A
14 15 5.58 0 0 A
15 16 5.50 0 0 A
16 17 5.61 0 1 B
17 18 5.53 0 0 B
18 19 5.40 0 0 B
19 20 5.51 0 0 B
20 21 5.47 0 0 B
21 22 5.44 0 0 B
22 23 5.39 0 0 B
23 24 5.27 0 0 B
24 25 5.38 0 0 B
25 26 5.35 0 0 B
26 27 5.32 0 0 B
27 28 5.09 1 0 A
If we don't use apply with a function and for-loops, is there a native way to achieve by use pandas?
Share Improve this question asked Jan 18 at 14:57 Sun JarSun Jar 3411 silver badge12 bronze badges 2- You haven't shown/described what should happen if you have two successive 1s in peak or valley. – mozway Commented Jan 18 at 15:10
- @mozway These data are generated using the 'findpeak' library, which is used to analyze peaks and valleys in continuous data. The data will not have consecutive 1s, and the peak column and valley column will not be 1 on the same row. – Sun Jar Commented Jan 19 at 0:12
1 Answer
Reset to default 1Given your description, you could use np.select
and ffill
. This way you'll ensure that even if you have multiple peaks before a valley or conversely this will keep the order:
m1 = df['valley'].eq(1)
m2 = df['peak'].eq(1)
df['grp'] = pd.Series(np.select([m1, m2], ['A', 'B'], pd.NA),
index=df.index).ffill()
Variant with case_when
:
m1 = df['valley'].eq(1)
m2 = df['peak'].eq(1)
df['grp'] = df['valley'].case_when([(m1, 'A'), (m2, 'B'), (~(m1|m2), pd.NA)]
).ffill()
Or with from_dummies
after adding a new column:
df['grp'] = (pd.from_dummies(df[['valley', 'peak']]
.assign(other=df[['valley', 'peak']]
.sum(axis=1).rsub(1)))
.squeeze().map({'valley': 'A', 'peak': 'B'}).ffill()
)
Or with reshaping:
df['grp'] = (df[['valley', 'peak']]
.rename_axis(columns='col')
.replace(0, pd.NA).stack().reset_index(-1, name='val')
.replace({'col': {'valley': 'A', 'peak': 'B'}})['col']
.reindex(df.index).ffill()
)
Alternatively, if there is always a valley then peak then valley... you could use cumsum
+mod
and map
the group after identify which one of peak/valley is the first:
df['grp'] = (df[['valley', 'peak']].max(axis=1).cumsum()
.add(df[['valley', 'peak']].idxmax().idxmin() == 'peak')
.mod(2).map({0: 'B', 1: 'A'})
)
Output:
x y valley peak grp
0 1 5.69 1 0 A
1 2 6.03 0 0 A
2 3 6.03 0 1 B
3 4 6.03 0 0 B
4 5 6.03 0 0 B
5 6 6.03 0 0 B
6 7 6.03 0 0 B
7 8 5.38 0 0 B
8 9 5.21 1 0 A
9 10 5.40 0 0 A
10 11 5.24 0 0 A
11 12 5.40 0 0 A
12 13 5.36 0 0 A
13 14 5.47 0 0 A
14 15 5.58 0 0 A
15 16 5.50 0 0 A
16 17 5.61 0 1 B
17 18 5.53 0 0 B
18 19 5.40 0 0 B
19 20 5.51 0 0 B
20 21 5.47 0 0 B
21 22 5.44 0 0 B
22 23 5.39 0 0 B
23 24 5.27 0 0 B
24 25 5.38 0 0 B
25 26 5.35 0 0 B
26 27 5.32 0 0 B
27 28 5.09 1 0 A