python - Shuffle a dataset w.r.t a column value

I have the following Dataframe, which contains, among others, UserID and rank_group as attribute:

  UserID  Col2  Col3  rank_group 
0    1     2     3     1
1    1     5     6     1
...
20   1     8     9     2
21   1    11    12     2
...
45   1    14    15     3
46   1    17    18     3
47   2     2     3     1
48   2     5     6     1
...
60   2     8     9     2
61   2    11    12     2
...
70   2    14    15     3
71   2    17    18     3

The dataframe has got an UserID, and for each user, it has rows with rank_group 1 on the top, followed by the rows with rank_group 2, etc. In other words, rank_group follows a specific progressive order, 1,2,3,4,etc

I would like to shuffle the order of the Dataframe's rows such that rank_group follow a random one. For example, if we compute the rank_group from 1 to n for each user, we should obtain after shuffling, the dataset reflecting any permutation from 1 to n.

I tried df.sample(frac=1) but it does not take into account the rank_group block but it mixes any row with any row. It is not what I am looking for. In my case, it has to maintain the same order within a fixed rank_group. Also, looked into the np.random.permutation, same issue here. Any help?

I have the following Dataframe, which contains, among others, UserID and rank_group as attribute:

  UserID  Col2  Col3  rank_group 
0    1     2     3     1
1    1     5     6     1
...
20   1     8     9     2
21   1    11    12     2
...
45   1    14    15     3
46   1    17    18     3
47   2     2     3     1
48   2     5     6     1
...
60   2     8     9     2
61   2    11    12     2
...
70   2    14    15     3
71   2    17    18     3

Share Improve this question edited Apr 3 at 0:27 BeRT2me 13.3k2 gold badges16 silver badges39 bronze badges asked Mar 31 at 12:34 Carlo Allocca 6811 gold badge8 silver badges20 bronze badges

You request is unclear, do you want to shuffle the rows within a group? – mozway Commented Mar 31 at 12:37
Or do you want to shuffle the groups keeping the relative order within a group constant? – mozway Commented Mar 31 at 12:38
Or do you want to shuffle the groups keeping the relative order within a group constant? ---> Yes, this one. – Carlo Allocca Commented Mar 31 at 12:39
You request is unclear, do you want to shuffle the rows within a group? ---> No, I don't. I want to shuffle the groups and keeping the relative order within the groups – Carlo Allocca Commented Mar 31 at 12:40
Please remember that Stack Overflow is not your favourite Python forum, but rather a question and answer site for all programming related questions. Thus, always include the tag of the language you are programming in, that way other users familiar with that language can more easily find your question. Take the tour and read up on How to Ask to get more information on how this site works, then edit the question with the relevant tags. – Adriaan Commented Mar 31 at 12:44

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

If you want to shuffle the rows within a group, use groupby.sample:

df.groupby(['UserID', 'rank_group']).sample(frac=1)

Example output:

    UserID  Col2  Col3  rank_group
0        1     2     3           1
1        1     5     6           1
21       1    11    12           2
20       1     8     9           2
45       1    14    15           3
46       1    17    18           3
48       2     5     6           1
47       2     2     3           1
60       2     8     9           2
61       2    11    12           2
71       2    17    18           3
70       2    14    15           3

If you want to shuffle the groups keeping the relative order within a group constant, sample the unique groups, then merge:

(df[['UserID', 'rank_group']].drop_duplicates().sample(frac=1)
 .merge(df, how='left')
)

Example output:

    UserID  rank_group  Col2  Col3
0        2           1     2     3
1        2           1     5     6
2        1           3    14    15
3        1           3    17    18
4        1           2     8     9
5        1           2    11    12
6        2           2     8     9
7        2           2    11    12
8        2           3    14    15
9        2           3    17    18
10       1           1     2     3
11       1           1     5     6

And, if the index is important:

(df[['UserID', 'rank_group']].drop_duplicates().sample(frac=1)
 .merge(df.reset_index(), how='left')
 .set_index('index').rename_axis(df.index.name)
)

Example output:

    UserID  rank_group  Col2  Col3
70       2           3    14    15
71       2           3    17    18
0        1           1     2     3
1        1           1     5     6
20       1           2     8     9
21       1           2    11    12
60       2           2     8     9
61       2           2    11    12
45       1           3    14    15
46       1           3    17    18
47       2           1     2     3
48       2           1     5     6

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Shuffle a dataset w.r.t a column value - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)