最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

shuffle - How can I randomly sample n IDs for each combination of group_id and date in a Polars DataFrame? - Stack Overflow

programmeradmin0浏览0评论

I am trying to randomly sample n IDs for each combination of group_id and date in a Polars DataFrame. However, I noticed that the sample function is producing the same set of IDs for each date no matter the group.

This might be due to the seed value being the same for all combinations? I tried to resolve this by creating a unique seed for each combination by generating a "group_date_int" column by combining group_id and date casted as Int64, but I encountered the following error:

.sample(n=n_samples, shuffle=True, seed=pl.col("group_date_int"))
TypeError: argument 'seed': 'Expr' object cannot be interpreted as an integer

For each date, I am getting the same set of IDs, rather than having a different random sample for each combination of group_id and date.

import pandas as pd
import polars as pl

# MWE
date_range = pd.date_range(start="2010-01-01", end="2025-12-01", freq="MS")
data = []

for current_date in date_range:
    for group_id in ['bd01', 'bd02', 'bd03']:  # Example of 3 different group_ids
        ids = list(range(10))  # Generate 10 IDs for each (group_id, current_date)
        data.extend([(str(current_date.date()), group_id, id_) for id_ in ids])  

# Create Polars DataFrame
df = pl.DataFrame(data, schema=["date", "group_id", "id"])

# Parameters
n_samples = 3  # Number of random samples to pick for each group
SEED = 42  # The seed used for sampling

# Create `selected_samples` by sampling `n_samples` IDs per (group_id, date) combination
selected_samples = (
    df
    .group_by(['group_id', 'date'])
    .agg(
        pl.col("id")
        .sample(n=n_samples, shuffle=True, seed=SEED)  
        .alias("random_ids")
    )
    .explode("random_ids")
    .select(["group_id", "date", "random_ids"])
    .rename({"random_ids": "id"})
)

Additionally, I tried using the shuffle function, but the results are the same.

1,6,5...1,6,5

┌──────────┬────────────┬─────┐
│ group_id ┆ date       ┆ id  │
│ ---      ┆ ---        ┆ --- │
│ str      ┆ str        ┆ i64 │
╞══════════╪════════════╪═════╡
│ bd01     ┆ 2025-07-01 ┆ 1   │
│ bd01     ┆ 2025-07-01 ┆ 6   │
│ bd01     ┆ 2025-07-01 ┆ 5   │
│ bd01     ┆ 2012-03-01 ┆ 1   │
│ bd01     ┆ 2012-03-01 ┆ 6   │
│ …        ┆ …          ┆ …   │
│ bd03     ┆ 2024-10-01 ┆ 6   │
│ bd03     ┆ 2024-10-01 ┆ 5   │
│ bd01     ┆ 2010-08-01 ┆ 1   │
│ bd01     ┆ 2010-08-01 ┆ 6   │
│ bd01     ┆ 2010-08-01 ┆ 5   │
└──────────┴────────────┴─────┘

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论