python - Polars upsampling with grouping does not behave as expected

Here is the data

import polars as pl
from datetime import datetime

df = pl.DataFrame(
    {
        "time": [
            datetime(2021, 2, 1),
            datetime(2021, 4, 2),
            datetime(2021, 5, 4),
            datetime(2021, 6, 6),
            datetime(2021, 6, 8),
            datetime(2021, 7, 10),
            datetime(2021, 8, 18),
            datetime(2021, 9, 20),
        ],
        "groups": ["A", "B", "A", "B","A","B","A","B"],
        "values": [0, 1, 2, 3,4,5,6,7],
    }
)

The upsampling and the testing:

(
    df
    .upsample(
        time_column="time", 
        every="1d", 
        group_by="groups", 
        maintain_order=True
        )
    .group_by('groups')
    .agg(pl.col('time').diff().max())
    
)

shape: (3, 2)
┌────────┬──────────────┐
│ groups ┆ time         │
│ ---    ┆ ---          │
│ str    ┆ duration[μs] │
╞════════╪══════════════╡
│ A      ┆ 92d          │
│ null   ┆ 2d           │
│ B      ┆ 72d          │
└────────┴──────────────┘

The diff is not 1 day as I would expect. Is this a bug, or am I doing something wrong?

Here is the data

import polars as pl
from datetime import datetime

df = pl.DataFrame(
    {
        "time": [
            datetime(2021, 2, 1),
            datetime(2021, 4, 2),
            datetime(2021, 5, 4),
            datetime(2021, 6, 6),
            datetime(2021, 6, 8),
            datetime(2021, 7, 10),
            datetime(2021, 8, 18),
            datetime(2021, 9, 20),
        ],
        "groups": ["A", "B", "A", "B","A","B","A","B"],
        "values": [0, 1, 2, 3,4,5,6,7],
    }
)

The upsampling and the testing:

(
    df
    .upsample(
        time_column="time", 
        every="1d", 
        group_by="groups", 
        maintain_order=True
        )
    .group_by('groups')
    .agg(pl.col('time').diff().max())
    
)

shape: (3, 2)
┌────────┬──────────────┐
│ groups ┆ time         │
│ ---    ┆ ---          │
│ str    ┆ duration[μs] │
╞════════╪══════════════╡
│ A      ┆ 92d          │
│ null   ┆ 2d           │
│ B      ┆ 72d          │
└────────┴──────────────┘

The diff is not 1 day as I would expect. Is this a bug, or am I doing something wrong?

Share Improve this question edited Mar 13 at 11:46 jqurious 22.1k5 gold badges20 silver badges39 bronze badges asked Mar 13 at 10:58 JohnRos 1,2572 gold badges11 silver badges22 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 3

It is due to the group columns resulting in null - which is a bug.

https://github/pola-rs/polars/issues/15530

upsample itself is implemented as a datetime_range and join

https://github/pola-rs/polars/blob/a4fbc9453cacb7e7e5cc476b30a98845aaa5f506/crates/polars-time/src/upsample.rs#L203

Which you could do manually as a workaround.

(df.group_by("groups")
   .agg(pl.datetime_range(pl.col("time").first(), pl.col("time").last()))
   .explode("time")
   .join(df, on=["groups", "time"], how="left")
   .group_by("groups")
   .agg(pl.col("time").diff().max())
)

shape: (2, 2)
┌────────┬──────────────┐
│ groups ┆ time         │
│ ---    ┆ ---          │
│ str    ┆ duration[μs] │
╞════════╪══════════════╡
│ A      ┆ 1d           │
│ B      ┆ 1d           │
└────────┴──────────────┘

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Polars upsampling with grouping does not behave as expected - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)