最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Python Polars: How to add columns in one lazyframe to another lazyframe? - Stack Overflow

programmeradmin10浏览0评论

I have a Polars LazyFrame and would like to add to it columns from another LazyFrame. The two LazyFrames have the same number of rows and different columns.

I have tried the following, which doesn't work as with_columns expects an iterable.

def append_columns(df:pl.LazyFrame):
    df2 = pl.LazyFrame([1,2])
    return df.with_columns(df2)

I have a Polars LazyFrame and would like to add to it columns from another LazyFrame. The two LazyFrames have the same number of rows and different columns.

I have tried the following, which doesn't work as with_columns expects an iterable.

def append_columns(df:pl.LazyFrame):
    df2 = pl.LazyFrame([1,2])
    return df.with_columns(df2)
Share Improve this question edited Feb 5 at 16:49 Hericks 10.1k2 gold badges21 silver badges34 bronze badges asked Feb 5 at 12:18 ArunArun 734 bronze badges
Add a comment  | 

3 Answers 3

Reset to default 3

For this, pl.concat setting how="horizontal" might be used.

import polars as pl

df = pl.LazyFrame({
    "a": [1, 2, 3],
    "b": [4, 5, 6],
})

other = pl.LazyFrame({
    "c": [9, 10, 11],
    "d": [12, 13, 14],
    "e": [15, 16, 17],
})

result = pl.concat((df, other.select("c", "d")), how="horizontal")

The resulting pl.LazyFrame then looks as follows.

shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a   ┆ b   ┆ c   ┆ d   │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1   ┆ 4   ┆ 9   ┆ 12  │
│ 2   ┆ 5   ┆ 10  ┆ 13  │
│ 3   ┆ 6   ┆ 11  ┆ 14  │
└─────┴─────┴─────┴─────┘

LazyFrames don't have any data until they're collected so you can't just do that.

There are 3 things you can do:

1. Do a proper join on some index column(s) that the two already share.
lf1.join(lf2, on="your_index_column")
2. If they don't share an index then you can make one
lf1.with_row_index().join(lf2.with_row_index(), on="index")

While it is possible for some LazyFrames to always return the same order when they're collected, this isn't guaranteed in the general case so you should be careful about doing this.

3. You can collect one of the dfs
lf1.with_columns(lf2.collect())

This has the same issues as number 2 but is probably what you're after.

Passing a dataframe into .with_columns sort of works "accidentally" and should probably be avoided.

e.g. it will fail in this case

df  = pl.DataFrame({"x": [1, 2], "y": [3, 4]})
df2 = pl.DataFrame({"z": [5, 6, 7]})
df.with_columns(df2)
# ShapeError: unable to add a column of length 3 to a DataFrame of height 2

The proper way to do this (that also works with LazyFrames) is to use .concat()

pl.concat([df, df2], how="horizontal")
shape: (3, 3)
┌──────┬──────┬─────┐
│ x    ┆ y    ┆ z   │
│ ---  ┆ ---  ┆ --- │
│ i64  ┆ i64  ┆ i64 │
╞══════╪══════╪═════╡
│ 1    ┆ 3    ┆ 5   │
│ 2    ┆ 4    ┆ 6   │
│ null ┆ null ┆ 7   │
└──────┴──────┴─────┘

The Polars User Guide has a section dedicated to explaining the different concat strategies.

  • https://docs.pola.rs/user-guide/transformations/concatenation/
发布评论

评论列表(0)

  1. 暂无评论