最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Replace last two row values in a grouped polars DataFrame - Stack Overflow

programmeradmin0浏览0评论

I need to replace the last two values in the value column of a pl.DataFrame with zeros, whereby I need to group_by the symbol column.

import polars as pl

df = pl.DataFrame(
    {"symbol": [*["A"] * 4, *["B"] * 4], "value": range(8)}
)

shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 2     │
│ A      ┆ 3     │
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 6     │
│ B      ┆ 7     │
└────────┴───────┘

Here is my expected outcome:

shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 0     │<-- replaced
│ A      ┆ 0     │<-- replaced
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 0     │<-- replaced
│ B      ┆ 0     │<-- replaced
└────────┴───────┘

I need to replace the last two values in the value column of a pl.DataFrame with zeros, whereby I need to group_by the symbol column.

import polars as pl

df = pl.DataFrame(
    {"symbol": [*["A"] * 4, *["B"] * 4], "value": range(8)}
)

shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 2     │
│ A      ┆ 3     │
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 6     │
│ B      ┆ 7     │
└────────┴───────┘

Here is my expected outcome:

shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 0     │<-- replaced
│ A      ┆ 0     │<-- replaced
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 0     │<-- replaced
│ B      ┆ 0     │<-- replaced
└────────┴───────┘
Share Improve this question asked Nov 20, 2024 at 15:24 AndiAndi 4,8995 gold badges33 silver badges63 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

You can use

  • pl.Expr.head() with pl.len() to get data without last two rows.
  • pl.Expr.append() and pl.repeat() to pad it with zeroes.
df.with_columns(
    pl.col.value.head(pl.len() - 2).append(pl.repeat(0, 2))
    .over("symbol")
)
shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 0     │
│ A      ┆ 0     │
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 0     │
│ B      ┆ 0     │
└────────┴───────┘

Alternatively, you can use

  • pl.when() to create conditional column.
  • pl.int_range() with pl.len() to affect only first n - 2 rows.
df.with_columns(
    pl.when(pl.int_range(pl.len()) < pl.len() - 2).then(pl.col.value)
    .otherwise(0)
    .over("symbol") 
)
shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 0     │
│ A      ┆ 0     │
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 0     │
│ B      ┆ 0     │
└────────┴───────┘

You can use .is_last_distinct() and .shift()

df.with_columns(
   pl.when(
      pl.any_horizontal(
          pl.col("symbol").is_last_distinct(),
          pl.col("symbol").shift(-1).over("symbol").is_last_distinct()
      ).not_()
   )
   .then(pl.col("value"))
   .otherwise(0)
)
shape: (8, 2)
┌────────┬───────┐
│ symbol ┆ value │
│ ---    ┆ ---   │
│ str    ┆ i64   │
╞════════╪═══════╡
│ A      ┆ 0     │
│ A      ┆ 1     │
│ A      ┆ 0     │
│ A      ┆ 0     │
│ B      ┆ 4     │
│ B      ┆ 5     │
│ B      ┆ 0     │
│ B      ┆ 0     │
└────────┴───────┘
发布评论

评论列表(0)

  1. 暂无评论