Python-Polars: Expression to get length of lists in a struct

In Python Polars, I am trying to extract the length of the lists inside a struct to re-use it in an expression.

For example, I have the code below:

import polars as pl


df = pl.DataFrame(
    {
        "x": [0, 4],
        "y": [
            {"low": [-1, 0, 1], "up": [1, 2, 3]},
            {"low": [-2, -1, 0], "up": [0, 1, 2]},
        ],
    }
)

df.with_columns(
    check=pl.concat_list([pl.all_horizontal(
        [
            pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
            pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
        ]
    ) for i in range(3)]).list.max()
)

shape: (2, 3)
┌─────┬─────────────────────────┬───────┐
│ x   ┆ y                       ┆ check │
│ --- ┆ ---                     ┆ ---   │
│ i64 ┆ struct[2]               ┆ bool  │
╞═════╪═════════════════════════╪═══════╡
│ 0   ┆ {[-1, 0, 1],[1, 2, 3]}  ┆ true  │
│ 4   ┆ {[-2, -1, 0],[0, 1, 2]} ┆ false │
└─────┴─────────────────────────┴───────┘

and I would like to infer the length of the lists in advance (i.e. not having to hardcode the 3), as it can change depending on the call.

The challenge I am facing, is that I need to include everything in the same expression context. I have tried as below, but it is not working as I cannot extract the value returned by one of the expressions:

df.with_columns(
    check=pl.concat_list([pl.all_horizontal(
        [
            pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
            pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
        ]
    ) for i in range(pl.col("y").struct["low"].list.len())]).list.max()
)

In Python Polars, I am trying to extract the length of the lists inside a struct to re-use it in an expression.

For example, I have the code below:

import polars as pl


df = pl.DataFrame(
    {
        "x": [0, 4],
        "y": [
            {"low": [-1, 0, 1], "up": [1, 2, 3]},
            {"low": [-2, -1, 0], "up": [0, 1, 2]},
        ],
    }
)

df.with_columns(
    check=pl.concat_list([pl.all_horizontal(
        [
            pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
            pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
        ]
    ) for i in range(3)]).list.max()
)

shape: (2, 3)
┌─────┬─────────────────────────┬───────┐
│ x   ┆ y                       ┆ check │
│ --- ┆ ---                     ┆ ---   │
│ i64 ┆ struct[2]               ┆ bool  │
╞═════╪═════════════════════════╪═══════╡
│ 0   ┆ {[-1, 0, 1],[1, 2, 3]}  ┆ true  │
│ 4   ┆ {[-2, -1, 0],[0, 1, 2]} ┆ false │
└─────┴─────────────────────────┴───────┘

and I would like to infer the length of the lists in advance (i.e. not having to hardcode the 3), as it can change depending on the call.

df.with_columns(
    check=pl.concat_list([pl.all_horizontal(
        [
            pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
            pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
        ]
    ) for i in range(pl.col("y").struct["low"].list.len())]).list.max()
)

Share Improve this question asked Feb 7 at 15:05 yz_jc 1757 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

Unfortunately, I don't see a way to use an expression for the list length here. Also, direct comparisons of list columns are not yet natively supported.

Still, some on-the-fly exploding and imploding of the list columns could be used to achieve the desired result without relying on knowing the list lengths upfront.

(
    df
    .with_columns(
        ge_low=(pl.col("x") >= pl.col("y").struct["low"].explode()).implode().over(pl.int_range(pl.len())),
        le_up=(pl.col("x") <= pl.col("y").struct["up"].explode()).implode().over(pl.int_range(pl.len())),
    )
    .with_columns(
        check=(pl.col("ge_low").explode() & pl.col("le_up").explode()).implode().over(pl.int_range(pl.len()))
    )
)

shape: (2, 5)
┌─────┬─────────────────────────┬─────────────────────┬───────────────────────┬───────────────────────┐
│ x   ┆ y                       ┆ ge_low              ┆ le_up                 ┆ check                 │
│ --- ┆ ---                     ┆ ---                 ┆ ---                   ┆ ---                   │
│ i64 ┆ struct[2]               ┆ list[bool]          ┆ list[bool]            ┆ list[bool]            │
╞═════╪═════════════════════════╪═════════════════════╪═══════════════════════╪═══════════════════════╡
│ 0   ┆ {[-1, 0, 1],[1, 2, 3]}  ┆ [true, true, false] ┆ [true, true, true]    ┆ [true, true, false]   │
│ 4   ┆ {[-2, -1, 0],[0, 1, 2]} ┆ [true, true, true]  ┆ [false, false, false] ┆ [false, false, false] │
└─────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┘

import polars as pl

df = pl.DataFrame(
    {
        "x": [0, 4, 2],
        "y": [
            {"low": [-1, 0, 1], "up": [1, 2, 3]},
            {"low": [-2, -1], "up": [0, 1]},
            {"low": [-3, -2, -1, 0], "up": [1, 2, 3, 4]},
        ],
    }
)

# 1. Get the length of the "low" lists:
df = df.with_columns(low_len=pl.col("y").struct["low"].list.len())

# 2. Get the length of the "up" lists:
df = df.with_columns(up_len=pl.col("y").struct["up"].list.len())

print(df)  # Print the DataFrame with low_len and up_len

# 3. If you want the maximum of the two:
df = df.with_columns(max_len=pl.max([pl.col("y").struct["low"].list.len(), pl.col("y").struct["up"].list.len()]))
print(df)  # Print the DataFrame with max_len

# 4. Handle potential missing struct fields gracefully (returning 0 if missing):
df = df.with_columns(
    low_len=pl.col("y").struct.field("low").list.len().fill_null(0),
    up_len=pl.col("y").struct.field("up").list.len().fill_null(0),
)

print(df)  # Print the DataFrame with low_len and up_len, handling missing fields

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Python-Polars: Expression to get length of lists in a struct - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)