I have a Polars LazyFrame
and would like to add to it columns from another LazyFrame
. The two LazyFrames
have the same number of rows and different columns.
I have tried the following, which doesn't work as with_columns
expects an iterable.
def append_columns(df:pl.LazyFrame):
df2 = pl.LazyFrame([1,2])
return df.with_columns(df2)
I have a Polars LazyFrame
and would like to add to it columns from another LazyFrame
. The two LazyFrames
have the same number of rows and different columns.
I have tried the following, which doesn't work as with_columns
expects an iterable.
def append_columns(df:pl.LazyFrame):
df2 = pl.LazyFrame([1,2])
return df.with_columns(df2)
Share
Improve this question
edited Feb 5 at 16:49
Hericks
10.1k2 gold badges21 silver badges34 bronze badges
asked Feb 5 at 12:18
ArunArun
734 bronze badges
3 Answers
Reset to default 3For this, pl.concat
setting how="horizontal"
might be used.
import polars as pl
df = pl.LazyFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
})
other = pl.LazyFrame({
"c": [9, 10, 11],
"d": [12, 13, 14],
"e": [15, 16, 17],
})
result = pl.concat((df, other.select("c", "d")), how="horizontal")
The resulting pl.LazyFrame
then looks as follows.
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a ┆ b ┆ c ┆ d │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ 1 ┆ 4 ┆ 9 ┆ 12 │
│ 2 ┆ 5 ┆ 10 ┆ 13 │
│ 3 ┆ 6 ┆ 11 ┆ 14 │
└─────┴─────┴─────┴─────┘
LazyFrames don't have any data until they're collected so you can't just do that.
There are 3 things you can do:
1. Do a proper join on some index column(s) that the two already share.
lf1.join(lf2, on="your_index_column")
2. If they don't share an index then you can make one
lf1.with_row_index().join(lf2.with_row_index(), on="index")
While it is possible for some LazyFrames to always return the same order when they're collected, this isn't guaranteed in the general case so you should be careful about doing this.
3. You can collect one of the dfs
lf1.with_columns(lf2.collect())
This has the same issues as number 2 but is probably what you're after.
Passing a dataframe into .with_columns
sort of works "accidentally" and should probably be avoided.
e.g. it will fail in this case
df = pl.DataFrame({"x": [1, 2], "y": [3, 4]})
df2 = pl.DataFrame({"z": [5, 6, 7]})
df.with_columns(df2)
# ShapeError: unable to add a column of length 3 to a DataFrame of height 2
The proper way to do this (that also works with LazyFrames) is to use .concat()
pl.concat([df, df2], how="horizontal")
shape: (3, 3)
┌──────┬──────┬─────┐
│ x ┆ y ┆ z │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════╪══════╪═════╡
│ 1 ┆ 3 ┆ 5 │
│ 2 ┆ 4 ┆ 6 │
│ null ┆ null ┆ 7 │
└──────┴──────┴─────┘
The Polars User Guide has a section dedicated to explaining the different concat strategies.
- https://docs.pola.rs/user-guide/transformations/concatenation/