Say I have:
import polars as pl
df = pl.DataFrame({'a':[1,1,2], 'b': [4,5,6]}).with_columns(c=pl.concat_list('a', 'b'))
print(df)
shape: (3, 3)
┌─────┬─────┬───────────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1 ┆ 4 ┆ [1, 4] │
│ 1 ┆ 5 ┆ [1, 5] │
│ 2 ┆ 6 ┆ [2, 6] │
└─────┴─────┴───────────┘
I can normalise column 'c' by doing:
In [15]: df.with_columns(c_normalised = pl.col('c') / pl.col('c').list.sum())
Out[15]:
shape: (3, 4)
┌─────┬─────┬───────────┬──────────────────────┐
│ a ┆ b ┆ c ┆ c_normalised │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] ┆ list[f64] │
╞═════╪═════╪═══════════╪══════════════════════╡
│ 1 ┆ 4 ┆ [1, 4] ┆ [0.2, 0.8] │
│ 1 ┆ 5 ┆ [1, 5] ┆ [0.166667, 0.833333] │
│ 2 ┆ 6 ┆ [2, 6] ┆ [0.25, 0.75] │
└─────┴─────┴───────────┴──────────────────────┘
How can I do this in DuckDB? I've tried
In [17]: duckdb.sql("""
...: from df
...: select c / list_sum(c)
...: """)
---------------------------------------------------------------------------
BinderException Traceback (most recent call last)
Cell In[17], line 1
----> 1 duckdb.sql("""
2 from df
3 select c / list_sum(c)
4 """)
BinderException: Binder Error: No function matches the given name and argument types '/(BIGINT[], HUGEINT)'. You might need to add explicit type casts.
Candidate functions:
/(FLOAT, FLOAT) -> FLOAT
/(DOUBLE, DOUBLE) -> DOUBLE
/(INTERVAL, BIGINT) -> INTERVAL
Say I have:
import polars as pl
df = pl.DataFrame({'a':[1,1,2], 'b': [4,5,6]}).with_columns(c=pl.concat_list('a', 'b'))
print(df)
shape: (3, 3)
┌─────┬─────┬───────────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] │
╞═════╪═════╪═══════════╡
│ 1 ┆ 4 ┆ [1, 4] │
│ 1 ┆ 5 ┆ [1, 5] │
│ 2 ┆ 6 ┆ [2, 6] │
└─────┴─────┴───────────┘
I can normalise column 'c' by doing:
In [15]: df.with_columns(c_normalised = pl.col('c') / pl.col('c').list.sum())
Out[15]:
shape: (3, 4)
┌─────┬─────┬───────────┬──────────────────────┐
│ a ┆ b ┆ c ┆ c_normalised │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ list[i64] ┆ list[f64] │
╞═════╪═════╪═══════════╪══════════════════════╡
│ 1 ┆ 4 ┆ [1, 4] ┆ [0.2, 0.8] │
│ 1 ┆ 5 ┆ [1, 5] ┆ [0.166667, 0.833333] │
│ 2 ┆ 6 ┆ [2, 6] ┆ [0.25, 0.75] │
└─────┴─────┴───────────┴──────────────────────┘
How can I do this in DuckDB? I've tried
In [17]: duckdb.sql("""
...: from df
...: select c / list_sum(c)
...: """)
---------------------------------------------------------------------------
BinderException Traceback (most recent call last)
Cell In[17], line 1
----> 1 duckdb.sql("""
2 from df
3 select c / list_sum(c)
4 """)
BinderException: Binder Error: No function matches the given name and argument types '/(BIGINT[], HUGEINT)'. You might need to add explicit type casts.
Candidate functions:
/(FLOAT, FLOAT) -> FLOAT
/(DOUBLE, DOUBLE) -> DOUBLE
/(INTERVAL, BIGINT) -> INTERVAL
Share
Improve this question
edited Mar 17 at 14:03
ignoring_gravity
asked Mar 17 at 11:50
ignoring_gravityignoring_gravity
10.6k7 gold badges44 silver badges88 bronze badges
1 Answer
Reset to default 2Found it:
In [26]: duckdb.sql("""
...: from df
...: select *, list_transform(c, x -> x / list_sum(c)) as c_normalised
...: """)
Out[26]:
┌───────┬───────┬─────────┬───────────────────────────────────────────┐
│ a │ b │ c │ c_normalised │
│ int64 │ int64 │ int64[] │ double[] │
├───────┼───────┼─────────┼───────────────────────────────────────────┤
│ 1 │ 4 │ [1, 4] │ [0.2, 0.8] │
│ 1 │ 5 │ [1, 5] │ [0.16666666666666666, 0.8333333333333334] │
│ 2 │ 6 │ [2, 6] │ [0.25, 0.75] │
└───────┴───────┴─────────┴───────────────────────────────────────────┘
Or, even nicer:
In [39]: duckdb.sql("""
...: from df
...: select *, [x / list_sum(c) for x in c] as c_normalised
...: """)
Out[39]:
┌───────┬───────┬─────────┬───────────────────────────────────────────┐
│ a │ b │ c │ c_normalised │
│ int64 │ int64 │ int64[] │ double[] │
├───────┼───────┼─────────┼───────────────────────────────────────────┤
│ 1 │ 4 │ [1, 4] │ [0.2, 0.8] │
│ 1 │ 5 │ [1, 5] │ [0.16666666666666666, 0.8333333333333334] │
│ 2 │ 6 │ [2, 6] │ [0.25, 0.75] │
└───────┴───────┴─────────┴───────────────────────────────────────────┘