I want to sort the first column according to the internal algorithm, and then sort the second column according to the custom sorting method
The test data is as follows:
A B
Ankang Shaanxi Ankang Southeast
Baoding Anguo Baoding Anguo Northeast
Baoding Anguo Baoding Anguo Southeast
Changsha Hunan Changsha Hunan Bright
Ankang Shaanxi Ankang Northeast
Baoding Anguo Baoding Anguo Southwest
Baoding Anguo Baoding Anguo Upper
Ankang Shaanxi Ankang Southwest
Luoyang Henan Luoyang Henan Upper
Baoding Anguo Baoding Anguo Northwest
Changsha Hunan Changsha Hunan Lower
Ankang Shaanxi Ankang Southwest Upper
Ankang Shaanxi Ankang Northwest
I hope to be able to arrange it as shown below
The first column is sorted together using pandas' built-in string sorting algorithm, and then the second column is sorted using the custom order algorithm of northeast, southeast, northwest, southwest,upper.
I used pandas' sort_values() method to sort. I had no problem sorting a single column, but it always failed when I tried to sort two columns together.
import pandas as pd
data={'A':['Ankang Shaanxi','Baoding Anguo','Baoding Anguo','Changsha Hunan','Ankang Shaanxi',
'Baoding Anguo','Baoding Anguo','Ankang Shaanxi','Luoyang Henan','Baoding Anguo',
'Changsha Hunan','Ankang Shaanxi','Ankang Shaanxi'],
'B':['Ankang Southeast','Baoding Anguo Northeast','Baoding Anguo Southeast','Changsha Hunan Bright','Ankang Northeast','Baoding Anguo Southwest','Baoding Anguo Upper','Ankang Southwest','Luoyang Henan Upper','Baoding Anguo Northwest','Changsha Hunan Lower','Ankang Southwest Upper','Ankang Northwest']}
df=pd.DataFrame(data)
def sort_fun(x):
return x.split()[-1]
df['sort_value']=df['B'].apply(sort_fun)
sort_dicts={'Northeast':0,'Southeast':1,'Northwest':2,'Southwest':3,'Upper':4}
df.sort_values(by=['A','sort_value'],key=lambda x :x.map(sort_dicts))
I referred to it Pandas: How to custom-sort on multiple columns?
A B
Ankang Shaanxi Ankang Northeast
Ankang Shaanxi Ankang Southeast
Ankang Shaanxi Ankang Northwest
Ankang Shaanxi Ankang Southwest
Ankang Shaanxi Ankang Southwest Upper
Baoding Anguo Baoding Anguo Northeast
Baoding Anguo Baoding Anguo Southeast
Baoding Anguo Baoding Anguo Northwest
Baoding Anguo Baoding Anguo Southwest
Baoding Anguo Baoding Anguo Upper
Changsha Hunan Changsha Hunan Bright
Changsha Hunan Changsha Hunan Lower
Luoyang Henan Luoyang Henan Upper
I want to sort the first column according to the internal algorithm, and then sort the second column according to the custom sorting method
The test data is as follows:
A B
Ankang Shaanxi Ankang Southeast
Baoding Anguo Baoding Anguo Northeast
Baoding Anguo Baoding Anguo Southeast
Changsha Hunan Changsha Hunan Bright
Ankang Shaanxi Ankang Northeast
Baoding Anguo Baoding Anguo Southwest
Baoding Anguo Baoding Anguo Upper
Ankang Shaanxi Ankang Southwest
Luoyang Henan Luoyang Henan Upper
Baoding Anguo Baoding Anguo Northwest
Changsha Hunan Changsha Hunan Lower
Ankang Shaanxi Ankang Southwest Upper
Ankang Shaanxi Ankang Northwest
I hope to be able to arrange it as shown below
The first column is sorted together using pandas' built-in string sorting algorithm, and then the second column is sorted using the custom order algorithm of northeast, southeast, northwest, southwest,upper.
I used pandas' sort_values() method to sort. I had no problem sorting a single column, but it always failed when I tried to sort two columns together.
import pandas as pd
data={'A':['Ankang Shaanxi','Baoding Anguo','Baoding Anguo','Changsha Hunan','Ankang Shaanxi',
'Baoding Anguo','Baoding Anguo','Ankang Shaanxi','Luoyang Henan','Baoding Anguo',
'Changsha Hunan','Ankang Shaanxi','Ankang Shaanxi'],
'B':['Ankang Southeast','Baoding Anguo Northeast','Baoding Anguo Southeast','Changsha Hunan Bright','Ankang Northeast','Baoding Anguo Southwest','Baoding Anguo Upper','Ankang Southwest','Luoyang Henan Upper','Baoding Anguo Northwest','Changsha Hunan Lower','Ankang Southwest Upper','Ankang Northwest']}
df=pd.DataFrame(data)
def sort_fun(x):
return x.split()[-1]
df['sort_value']=df['B'].apply(sort_fun)
sort_dicts={'Northeast':0,'Southeast':1,'Northwest':2,'Southwest':3,'Upper':4}
df.sort_values(by=['A','sort_value'],key=lambda x :x.map(sort_dicts))
I referred to it Pandas: How to custom-sort on multiple columns?
A B
Ankang Shaanxi Ankang Northeast
Ankang Shaanxi Ankang Southeast
Ankang Shaanxi Ankang Northwest
Ankang Shaanxi Ankang Southwest
Ankang Shaanxi Ankang Southwest Upper
Baoding Anguo Baoding Anguo Northeast
Baoding Anguo Baoding Anguo Southeast
Baoding Anguo Baoding Anguo Northwest
Baoding Anguo Baoding Anguo Southwest
Baoding Anguo Baoding Anguo Upper
Changsha Hunan Changsha Hunan Bright
Changsha Hunan Changsha Hunan Lower
Luoyang Henan Luoyang Henan Upper
Share
Improve this question
asked 19 hours ago
张宇杰张宇杰
331 silver badge6 bronze badges
2 Answers
Reset to default 1The basic logic you can use for column 'B'
:
Series.str.split
+ accessstr[-1]
+Series.map
df['B'].str.split().str[-1].map(sort_dicts)
0 1.0
1 0.0
2 1.0
3 NaN
4 0.0
5 3.0
6 4.0
7 3.0
8 4.0
9 2.0
10 NaN
11 4.0
12 2.0
Name: B, dtype: float64
Couple of ways to sort using this logic:
Option 1
Chain calls to df.sort_values
:
# note 'B' first
def sort_fun(s):
return s.str.split().str[-1].map(sort_dicts)
out = (df.sort_values('B', key=sort_fun)
.sort_values('A', ignore_index=True)
)
Option 2
Adjust sort_fun
to only affect col 'B'
:
def sort_fun2(s, name):
if s.name == name: # for 'B'
return s.str.split().str[-1].map(sort_dicts)
return s
out2 = df.sort_values(['A', 'B'], key=lambda x: sort_fun2(x, 'B'),
ignore_index=True)
Indeed, your original approach also applied the function passed to key
to df['A']
(i.e., df['A'].map(sort_dicts)
), leading to a series with NaN
values to "sort".
Option 3
Use np.lexsort
as suggested by @mozway in the linked post:
# again: note 'B' goes first
import numpy as np
sort = np.lexsort((df['B'].str.split().str[-1].map(sort_dicts),
df['A']))
out3 = df.iloc[sort].reset_index(drop=True)
Output
out
A B
0 Ankang Shaanxi Ankang Northeast
1 Ankang Shaanxi Ankang Southeast
2 Ankang Shaanxi Ankang Northwest
3 Ankang Shaanxi Ankang Southwest
4 Ankang Shaanxi Ankang Southwest Upper
5 Baoding Anguo Baoding Anguo Northeast
6 Baoding Anguo Baoding Anguo Southeast
7 Baoding Anguo Baoding Anguo Northwest
8 Baoding Anguo Baoding Anguo Southwest
9 Baoding Anguo Baoding Anguo Upper
10 Changsha Hunan Changsha Hunan Bright
11 Changsha Hunan Changsha Hunan Lower
12 Luoyang Henan Luoyang Henan Upper
Equality check with desired output:
data2 = {'A': ['Ankang Shaanxi', 'Ankang Shaanxi', 'Ankang Shaanxi',
'Ankang Shaanxi', 'Ankang Shaanxi', 'Baoding Anguo',
'Baoding Anguo', 'Baoding Anguo', 'Baoding Anguo',
'Baoding Anguo', 'Changsha Hunan', 'Changsha Hunan',
'Luoyang Henan'],
'B': ['Ankang Northeast', 'Ankang Southeast', 'Ankang Northwest',
'Ankang Southwest', 'Ankang Southwest Upper',
'Baoding Anguo Northeast', 'Baoding Anguo Southeast',
'Baoding Anguo Northwest', 'Baoding Anguo Southwest',
'Baoding Anguo Upper', 'Changsha Hunan Bright',
'Changsha Hunan Lower', 'Luoyang Henan Upper']}
desired = pd.DataFrame(data2)
all(df.equals(desired) for df in [out, out2, out3])
# True
You can assign
an new column based on the extract
+map
:
out = (df
.assign(sort_value=df['B'].str.extract(r'(\w+)$', expand=False).map(sort_dicts))
.sort_values(by=['A', 'sort_value'])
.drop(columns='sort_value')
)
NB. df['B'].str.extract(r'(\w+)$', expand=False)
is equivalent to df['B'].str.split().str[-1]
but usually faster.
In such cases, having multiple conditions with different sorting keys, I prefer to use numpy.lexsort
+iloc
, which avoids the need to assign a temporary column:
out = df.iloc[np.lexsort([df['B'].str.extract(r'(\w+)$', expand=False)
.map(sort_dicts),
df['A']])]
Alternatively, if you don't want to use lexsort
, you could also use a dictionary of custom sorting keys, or keep the original values by default:
sorter = {'B': df['B'].str.extract(r'(\w+)$', expand=False).map(sort_dicts)}
out = df.sort_values(['A', 'B'], key=lambda x: sorter.get(x.name, x))
Output:
A B
4 Ankang Shaanxi Ankang Northeast
0 Ankang Shaanxi Ankang Southeast
12 Ankang Shaanxi Ankang Northwest
7 Ankang Shaanxi Ankang Southwest
11 Ankang Shaanxi Ankang Southwest Upper
1 Baoding Anguo Baoding Anguo Northeast
2 Baoding Anguo Baoding Anguo Southeast
9 Baoding Anguo Baoding Anguo Northwest
5 Baoding Anguo Baoding Anguo Southwest
6 Baoding Anguo Baoding Anguo Upper
3 Changsha Hunan Changsha Hunan Bright
10 Changsha Hunan Changsha Hunan Lower
8 Luoyang Henan Luoyang Henan Upper
Why did the original approach fail?
Your issue in df.sort_values(by=['A', 'sort_value'], key=lambda x :x.map(sort_dicts))
is that you're passing a key that will be applied to both columns, which gives you an intermediate that cannot sort on A
:
A sort_value
0 None 1.0
1 None 0.0
2 None 1.0
3 None NaN
4 None 0.0
5 None 3.0
6 None 4.0
7 None 3.0
8 None 4.0
9 None 2.0
10 None NaN
11 None 4.0
12 None 2.0