最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to sort a DataFrame by time while maintaining paired event order in pandas? - Stack Overflow

programmeradmin2浏览0评论

I have a dataframe, that I want to order in a specific way. An example of such a (desired) sorted dataframe is shown below.

data = {
    "ID": [1, 1, 1, 1, 1, 1, 1, 1],
    "Type": ["Enter", "Out", "In", "Out", "In", "Out", "In", "Exit"],
    "Department": ["A", "A", "A", "A", "A", "A", "B", "B"],
    "Time": ["11:00", "11:00", "11:00", "11:00", "11:00", "12:30", "12:30", "15:00"],
    "Code": [519, 519, 620, 620, 588, 588, 322, 322]
}

df = pd.DataFrame(data)

   ID   Type Department   Time  Code
0   1  Enter          A  11:00   519
1   1    Out          A  11:00   519
2   1     In          A  11:00   620
3   1    Out          A  11:00   620
4   1     In          A  11:00   588
5   1    Out          A  12:30   588
6   1     In          B  12:30   322
7   1   Exit          B  15:00   322

Background: An object enters a certain department into a specific code. Every time the object moves, it registers the "out-movement", refering to the the department and code where it comes from, and it registers the 'in-movement', referring to the department and code where it goes to. Eventually, after some number of movements, the object exits again.

In the example I have ordered it the way I want it to. However, my original dataframe is not correctly sorted and I can't seem to figure out how to do this. To me it seems I have to sort of work in groupby's regarding the Code column, combined with sorting in Time. However, an object might go back to the same code at a later moment, so that complicates the issue as well.

Also, in my original dataframe, the timestamps are timestamps that include date. This is merely a simplified example.

I have tried multiple different orders of subsets given to the function sort_values, but they all are not working as desired.

E.g. [ID, Time, Code, Type] will not result in the desired pairing of the codes. It results in:

   ID   Type Department   Time  Code
0   1  Enter          A  11:00   519
1   1    Out          A  11:00   519
4   1     In          A  11:00   588
2   1     In          A  11:00   620
3   1    Out          A  11:00   620
6   1     In          B  12:30   322
5   1    Out          A  12:30   588
7   1   Exit          B  15:00   322

Furthermore, I can't put code before time, as the main idea is to sort on the time.

At this point I fear I have to sort it by checking one by one in a for-loop, but I hope there is a more efficient method.

I have a dataframe, that I want to order in a specific way. An example of such a (desired) sorted dataframe is shown below.

data = {
    "ID": [1, 1, 1, 1, 1, 1, 1, 1],
    "Type": ["Enter", "Out", "In", "Out", "In", "Out", "In", "Exit"],
    "Department": ["A", "A", "A", "A", "A", "A", "B", "B"],
    "Time": ["11:00", "11:00", "11:00", "11:00", "11:00", "12:30", "12:30", "15:00"],
    "Code": [519, 519, 620, 620, 588, 588, 322, 322]
}

df = pd.DataFrame(data)

   ID   Type Department   Time  Code
0   1  Enter          A  11:00   519
1   1    Out          A  11:00   519
2   1     In          A  11:00   620
3   1    Out          A  11:00   620
4   1     In          A  11:00   588
5   1    Out          A  12:30   588
6   1     In          B  12:30   322
7   1   Exit          B  15:00   322

Background: An object enters a certain department into a specific code. Every time the object moves, it registers the "out-movement", refering to the the department and code where it comes from, and it registers the 'in-movement', referring to the department and code where it goes to. Eventually, after some number of movements, the object exits again.

In the example I have ordered it the way I want it to. However, my original dataframe is not correctly sorted and I can't seem to figure out how to do this. To me it seems I have to sort of work in groupby's regarding the Code column, combined with sorting in Time. However, an object might go back to the same code at a later moment, so that complicates the issue as well.

Also, in my original dataframe, the timestamps are timestamps that include date. This is merely a simplified example.

I have tried multiple different orders of subsets given to the function sort_values, but they all are not working as desired.

E.g. [ID, Time, Code, Type] will not result in the desired pairing of the codes. It results in:

   ID   Type Department   Time  Code
0   1  Enter          A  11:00   519
1   1    Out          A  11:00   519
4   1     In          A  11:00   588
2   1     In          A  11:00   620
3   1    Out          A  11:00   620
6   1     In          B  12:30   322
5   1    Out          A  12:30   588
7   1   Exit          B  15:00   322

Furthermore, I can't put code before time, as the main idea is to sort on the time.

At this point I fear I have to sort it by checking one by one in a for-loop, but I hope there is a more efficient method.

Share Improve this question asked Mar 18 at 18:32 user29987781user29987781 11 silver badge
Add a comment  | 

1 Answer 1

Reset to default 1

You’re trying to sort your DataFrame, but it’s a bit tricky to get the desired order, right? No worries, there’s a clean way to do this without loops. The key is to create a custom sorting logic for the Type column and then use it together with Time to keep the movements in the right sequence.

# import...
data = { # input here
    "ID": [1, 1, 1, 1, 1, 1, 1, 1],
    "Type": ["Enter", "Out", "In", "Out", "In", "Out", "In", "Exit"],
    "Department": ["A", "A", "A", "A", "A", "A", "B", "B"],
    "Time": ["11:00", "11:00", "11:00", "11:00", "11:00", "12:30", "12:30", "15:00"],
    "Code": [519, 519, 620, 620, 588, 588, 322, 322]
}
df = pd.DataFrame(data)
# sorting by 'Type'
sort_order = {"Enter": 0, "Out": 1, "In": 2, "Exit": 3} 
df['SortKey'] = df['Type'].map(sort_order)
# sorting by others...
df = df.sort_values(by=["ID", "Time", "SortKey", "Department"]).reset_index(drop=True)
发布评论

评论列表(0)

  1. 暂无评论