I have the following Dataframe, which contains, among others, UserID and rank_group as attribute:
UserID Col2 Col3 rank_group
0 1 2 3 1
1 1 5 6 1
...
20 1 8 9 2
21 1 11 12 2
...
45 1 14 15 3
46 1 17 18 3
47 2 2 3 1
48 2 5 6 1
...
60 2 8 9 2
61 2 11 12 2
...
70 2 14 15 3
71 2 17 18 3
The dataframe has got an UserID, and for each user, it has rows with rank_group 1 on the top, followed by the rows with rank_group 2, etc. In other words, rank_group follows a specific progressive order, 1,2,3,4,etc
I would like to shuffle the order of the Dataframe's rows such that rank_group follow a random one. For example, if we compute the rank_group from 1 to n for each user, we should obtain after shuffling, the dataset reflecting any permutation from 1 to n.
I tried df.sample(frac=1) but it does not take into account the rank_group block but it mixes any row with any row. It is not what I am looking for. In my case, it has to maintain the same order within a fixed rank_group. Also, looked into the np.random.permutation, same issue here. Any help?
I have the following Dataframe, which contains, among others, UserID and rank_group as attribute:
UserID Col2 Col3 rank_group
0 1 2 3 1
1 1 5 6 1
...
20 1 8 9 2
21 1 11 12 2
...
45 1 14 15 3
46 1 17 18 3
47 2 2 3 1
48 2 5 6 1
...
60 2 8 9 2
61 2 11 12 2
...
70 2 14 15 3
71 2 17 18 3
The dataframe has got an UserID, and for each user, it has rows with rank_group 1 on the top, followed by the rows with rank_group 2, etc. In other words, rank_group follows a specific progressive order, 1,2,3,4,etc
I would like to shuffle the order of the Dataframe's rows such that rank_group follow a random one. For example, if we compute the rank_group from 1 to n for each user, we should obtain after shuffling, the dataset reflecting any permutation from 1 to n.
I tried df.sample(frac=1) but it does not take into account the rank_group block but it mixes any row with any row. It is not what I am looking for. In my case, it has to maintain the same order within a fixed rank_group. Also, looked into the np.random.permutation, same issue here. Any help?
Share Improve this question edited Apr 3 at 0:27 BeRT2me 13.3k2 gold badges16 silver badges39 bronze badges asked Mar 31 at 12:34 Carlo AlloccaCarlo Allocca 6811 gold badge8 silver badges20 bronze badges 5- You request is unclear, do you want to shuffle the rows within a group? – mozway Commented Mar 31 at 12:37
- Or do you want to shuffle the groups keeping the relative order within a group constant? – mozway Commented Mar 31 at 12:38
- Or do you want to shuffle the groups keeping the relative order within a group constant? ---> Yes, this one. – Carlo Allocca Commented Mar 31 at 12:39
- You request is unclear, do you want to shuffle the rows within a group? ---> No, I don't. I want to shuffle the groups and keeping the relative order within the groups – Carlo Allocca Commented Mar 31 at 12:40
- Please remember that Stack Overflow is not your favourite Python forum, but rather a question and answer site for all programming related questions. Thus, always include the tag of the language you are programming in, that way other users familiar with that language can more easily find your question. Take the tour and read up on How to Ask to get more information on how this site works, then edit the question with the relevant tags. – Adriaan Commented Mar 31 at 12:44
1 Answer
Reset to default 2If you want to shuffle the rows within a group, use groupby.sample
:
df.groupby(['UserID', 'rank_group']).sample(frac=1)
Example output:
UserID Col2 Col3 rank_group
0 1 2 3 1
1 1 5 6 1
21 1 11 12 2
20 1 8 9 2
45 1 14 15 3
46 1 17 18 3
48 2 5 6 1
47 2 2 3 1
60 2 8 9 2
61 2 11 12 2
71 2 17 18 3
70 2 14 15 3
If you want to shuffle the groups keeping the relative order within a group constant, sample
the unique groups, then merge
:
(df[['UserID', 'rank_group']].drop_duplicates().sample(frac=1)
.merge(df, how='left')
)
Example output:
UserID rank_group Col2 Col3
0 2 1 2 3
1 2 1 5 6
2 1 3 14 15
3 1 3 17 18
4 1 2 8 9
5 1 2 11 12
6 2 2 8 9
7 2 2 11 12
8 2 3 14 15
9 2 3 17 18
10 1 1 2 3
11 1 1 5 6
And, if the index is important:
(df[['UserID', 'rank_group']].drop_duplicates().sample(frac=1)
.merge(df.reset_index(), how='left')
.set_index('index').rename_axis(df.index.name)
)
Example output:
UserID rank_group Col2 Col3
70 2 3 14 15
71 2 3 17 18
0 1 1 2 3
1 1 1 5 6
20 1 2 8 9
21 1 2 11 12
60 2 2 8 9
61 2 2 11 12
45 1 3 14 15
46 1 3 17 18
47 2 1 2 3
48 2 1 5 6