最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Shuffle a dataset w.r.t a column value - Stack Overflow

programmeradmin3浏览0评论

I have the following Dataframe, which contains, among others, UserID and rank_group as attribute:

  UserID  Col2  Col3  rank_group 
0    1     2     3     1
1    1     5     6     1
...
20   1     8     9     2
21   1    11    12     2
...
45   1    14    15     3
46   1    17    18     3
47   2     2     3     1
48   2     5     6     1
...
60   2     8     9     2
61   2    11    12     2
...
70   2    14    15     3
71   2    17    18     3

The dataframe has got an UserID, and for each user, it has rows with rank_group 1 on the top, followed by the rows with rank_group 2, etc. In other words, rank_group follows a specific progressive order, 1,2,3,4,etc

I would like to shuffle the order of the Dataframe's rows such that rank_group follow a random one. For example, if we compute the rank_group from 1 to n for each user, we should obtain after shuffling, the dataset reflecting any permutation from 1 to n.

I tried df.sample(frac=1) but it does not take into account the rank_group block but it mixes any row with any row. It is not what I am looking for. In my case, it has to maintain the same order within a fixed rank_group. Also, looked into the np.random.permutation, same issue here. Any help?

I have the following Dataframe, which contains, among others, UserID and rank_group as attribute:

  UserID  Col2  Col3  rank_group 
0    1     2     3     1
1    1     5     6     1
...
20   1     8     9     2
21   1    11    12     2
...
45   1    14    15     3
46   1    17    18     3
47   2     2     3     1
48   2     5     6     1
...
60   2     8     9     2
61   2    11    12     2
...
70   2    14    15     3
71   2    17    18     3

The dataframe has got an UserID, and for each user, it has rows with rank_group 1 on the top, followed by the rows with rank_group 2, etc. In other words, rank_group follows a specific progressive order, 1,2,3,4,etc

I would like to shuffle the order of the Dataframe's rows such that rank_group follow a random one. For example, if we compute the rank_group from 1 to n for each user, we should obtain after shuffling, the dataset reflecting any permutation from 1 to n.

I tried df.sample(frac=1) but it does not take into account the rank_group block but it mixes any row with any row. It is not what I am looking for. In my case, it has to maintain the same order within a fixed rank_group. Also, looked into the np.random.permutation, same issue here. Any help?

Share Improve this question edited Apr 3 at 0:27 BeRT2me 13.3k2 gold badges16 silver badges39 bronze badges asked Mar 31 at 12:34 Carlo AlloccaCarlo Allocca 6811 gold badge8 silver badges20 bronze badges 5
  • You request is unclear, do you want to shuffle the rows within a group? – mozway Commented Mar 31 at 12:37
  • Or do you want to shuffle the groups keeping the relative order within a group constant? – mozway Commented Mar 31 at 12:38
  • Or do you want to shuffle the groups keeping the relative order within a group constant? ---> Yes, this one. – Carlo Allocca Commented Mar 31 at 12:39
  • You request is unclear, do you want to shuffle the rows within a group? ---> No, I don't. I want to shuffle the groups and keeping the relative order within the groups – Carlo Allocca Commented Mar 31 at 12:40
  • Please remember that Stack Overflow is not your favourite Python forum, but rather a question and answer site for all programming related questions. Thus, always include the tag of the language you are programming in, that way other users familiar with that language can more easily find your question. Take the tour and read up on How to Ask to get more information on how this site works, then edit the question with the relevant tags. – Adriaan Commented Mar 31 at 12:44
Add a comment  | 

1 Answer 1

Reset to default 2

If you want to shuffle the rows within a group, use groupby.sample:

df.groupby(['UserID', 'rank_group']).sample(frac=1)

Example output:

    UserID  Col2  Col3  rank_group
0        1     2     3           1
1        1     5     6           1
21       1    11    12           2
20       1     8     9           2
45       1    14    15           3
46       1    17    18           3
48       2     5     6           1
47       2     2     3           1
60       2     8     9           2
61       2    11    12           2
71       2    17    18           3
70       2    14    15           3

If you want to shuffle the groups keeping the relative order within a group constant, sample the unique groups, then merge:

(df[['UserID', 'rank_group']].drop_duplicates().sample(frac=1)
 .merge(df, how='left')
)

Example output:

    UserID  rank_group  Col2  Col3
0        2           1     2     3
1        2           1     5     6
2        1           3    14    15
3        1           3    17    18
4        1           2     8     9
5        1           2    11    12
6        2           2     8     9
7        2           2    11    12
8        2           3    14    15
9        2           3    17    18
10       1           1     2     3
11       1           1     5     6

And, if the index is important:

(df[['UserID', 'rank_group']].drop_duplicates().sample(frac=1)
 .merge(df.reset_index(), how='left')
 .set_index('index').rename_axis(df.index.name)
)

Example output:

    UserID  rank_group  Col2  Col3
70       2           3    14    15
71       2           3    17    18
0        1           1     2     3
1        1           1     5     6
20       1           2     8     9
21       1           2    11    12
60       2           2     8     9
61       2           2    11    12
45       1           3    14    15
46       1           3    17    18
47       2           1     2     3
48       2           1     5     6
发布评论

评论列表(0)

  1. 暂无评论