I have a dataframe, that I want to order in a specific way. An example of such a (desired) sorted dataframe is shown below.
data = {
"ID": [1, 1, 1, 1, 1, 1, 1, 1],
"Type": ["Enter", "Out", "In", "Out", "In", "Out", "In", "Exit"],
"Department": ["A", "A", "A", "A", "A", "A", "B", "B"],
"Time": ["11:00", "11:00", "11:00", "11:00", "11:00", "12:30", "12:30", "15:00"],
"Code": [519, 519, 620, 620, 588, 588, 322, 322]
}
df = pd.DataFrame(data)
ID Type Department Time Code
0 1 Enter A 11:00 519
1 1 Out A 11:00 519
2 1 In A 11:00 620
3 1 Out A 11:00 620
4 1 In A 11:00 588
5 1 Out A 12:30 588
6 1 In B 12:30 322
7 1 Exit B 15:00 322
Background: An object enters a certain department into a specific code. Every time the object moves, it registers the "out-movement", refering to the the department and code where it comes from, and it registers the 'in-movement', referring to the department and code where it goes to. Eventually, after some number of movements, the object exits again.
In the example I have ordered it the way I want it to. However, my original dataframe is not correctly sorted and I can't seem to figure out how to do this. To me it seems I have to sort of work in groupby's regarding the Code column, combined with sorting in Time. However, an object might go back to the same code at a later moment, so that complicates the issue as well.
Also, in my original dataframe, the timestamps are timestamps that include date. This is merely a simplified example.
I have tried multiple different orders of subsets given to the function sort_values, but they all are not working as desired.
E.g. [ID, Time, Code, Type] will not result in the desired pairing of the codes. It results in:
ID Type Department Time Code
0 1 Enter A 11:00 519
1 1 Out A 11:00 519
4 1 In A 11:00 588
2 1 In A 11:00 620
3 1 Out A 11:00 620
6 1 In B 12:30 322
5 1 Out A 12:30 588
7 1 Exit B 15:00 322
Furthermore, I can't put code before time, as the main idea is to sort on the time.
At this point I fear I have to sort it by checking one by one in a for-loop, but I hope there is a more efficient method.
I have a dataframe, that I want to order in a specific way. An example of such a (desired) sorted dataframe is shown below.
data = {
"ID": [1, 1, 1, 1, 1, 1, 1, 1],
"Type": ["Enter", "Out", "In", "Out", "In", "Out", "In", "Exit"],
"Department": ["A", "A", "A", "A", "A", "A", "B", "B"],
"Time": ["11:00", "11:00", "11:00", "11:00", "11:00", "12:30", "12:30", "15:00"],
"Code": [519, 519, 620, 620, 588, 588, 322, 322]
}
df = pd.DataFrame(data)
ID Type Department Time Code
0 1 Enter A 11:00 519
1 1 Out A 11:00 519
2 1 In A 11:00 620
3 1 Out A 11:00 620
4 1 In A 11:00 588
5 1 Out A 12:30 588
6 1 In B 12:30 322
7 1 Exit B 15:00 322
Background: An object enters a certain department into a specific code. Every time the object moves, it registers the "out-movement", refering to the the department and code where it comes from, and it registers the 'in-movement', referring to the department and code where it goes to. Eventually, after some number of movements, the object exits again.
In the example I have ordered it the way I want it to. However, my original dataframe is not correctly sorted and I can't seem to figure out how to do this. To me it seems I have to sort of work in groupby's regarding the Code column, combined with sorting in Time. However, an object might go back to the same code at a later moment, so that complicates the issue as well.
Also, in my original dataframe, the timestamps are timestamps that include date. This is merely a simplified example.
I have tried multiple different orders of subsets given to the function sort_values, but they all are not working as desired.
E.g. [ID, Time, Code, Type] will not result in the desired pairing of the codes. It results in:
ID Type Department Time Code
0 1 Enter A 11:00 519
1 1 Out A 11:00 519
4 1 In A 11:00 588
2 1 In A 11:00 620
3 1 Out A 11:00 620
6 1 In B 12:30 322
5 1 Out A 12:30 588
7 1 Exit B 15:00 322
Furthermore, I can't put code before time, as the main idea is to sort on the time.
At this point I fear I have to sort it by checking one by one in a for-loop, but I hope there is a more efficient method.
Share Improve this question asked Mar 18 at 18:32 user29987781user29987781 11 silver badge1 Answer
Reset to default 1You’re trying to sort your DataFrame, but it’s a bit tricky to get the desired order, right? No worries, there’s a clean way to do this without loops. The key is to create a custom sorting logic for the Type column and then use it together with Time to keep the movements in the right sequence.
# import...
data = { # input here
"ID": [1, 1, 1, 1, 1, 1, 1, 1],
"Type": ["Enter", "Out", "In", "Out", "In", "Out", "In", "Exit"],
"Department": ["A", "A", "A", "A", "A", "A", "B", "B"],
"Time": ["11:00", "11:00", "11:00", "11:00", "11:00", "12:30", "12:30", "15:00"],
"Code": [519, 519, 620, 620, 588, 588, 322, 322]
}
df = pd.DataFrame(data)
# sorting by 'Type'
sort_order = {"Enter": 0, "Out": 1, "In": 2, "Exit": 3}
df['SortKey'] = df['Type'].map(sort_order)
# sorting by others...
df = df.sort_values(by=["ID", "Time", "SortKey", "Department"]).reset_index(drop=True)