最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Sum of products of columns in polars - Stack Overflow

programmeradmin0浏览0评论

I have a dataset, part of which looks like this:

customer product price quantity sale_time
C060235 P0204 6.99 2 2024-03-11 08:24:11
C045298 P0167 14.99 1 2024-03-11 08:35:06
...
C039877 P0024 126.95 1 2024-09-30 21:18:45

I have a dataset, part of which looks like this:

customer product price quantity sale_time
C060235 P0204 6.99 2 2024-03-11 08:24:11
C045298 P0167 14.99 1 2024-03-11 08:35:06
...
C039877 P0024 126.95 1 2024-09-30 21:18:45

What I want is a list of unique customer, product pairs with the total sales, so something like:

customer product total
C0000105 P0168 643.78
C0000105 P0204 76.88
...
C1029871 P1680 435.44

Here's my attempt at constructing this. This gives me the grand total of all sales, which isn't what I want. What's a correct approach?

import polars as pl

db.select(
    (
        pl.col('customer'),
        pl.col('product'),
        pl.col('quantity').mul(pl.col('price')).alias('total')
    )
).group_by(('customer', 'product'))
Share Improve this question edited Mar 13 at 16:34 Scott Deerwester asked Mar 13 at 15:28 Scott DeerwesterScott Deerwester 3,9894 gold badges37 silver badges62 bronze badges 1
  • 3 Can you please add the exact output that you get when you run that code – Starship Remembers Shadow Commented Mar 13 at 15:43
Add a comment  | 

3 Answers 3

Reset to default 3

To do this calculate the sale amount for each row then group by both customer and product columns, and then sum the calculated amounts within each group

Your current query has a few issues:

  • You're selecting product and customer but grouping by item_lookup_key and shopper_card_number
  • You need to use an aggregation function after grouping

This approach works:

db.group_by(["customer", "product"]).agg([
    ((pl.col("quantity") * pl.col("price")).sum()).alias("total")
])

A more concise alternative is the expr.dot:

db.group_by("customer", "product").agg(
    total=pl.col("quantity").dot("price")
)

as you've not shown all the columns named in your example ie ('item_lookup_key', 'shopper_card_number'), here's a trivial one, that hopefully provides enough for you to progress

NB: am using polars 1.24.0 ! (linux mint 20.x)


cat wester.py
import polars as pl

# Sample dataset
data = {
    "customer": ["C060235", "C045298", "C039877", "C060235", "C039877"],
    "product": ["P0204", "P0167", "P0024", "P0204", "P0024"],
    "price": [6.99, 14.99, 126.95, 6.99, 126.95],
    "quantity": [2, 1, 1, 3, 2],
    "sale_time": [
        "2024-03-11 08:24:11",
        "2024-03-11 08:35:06",
        "2024-09-30 21:18:45",
        "2024-04-15 10:12:30",
        "2024-10-01 15:22:10",
    ],
}

df = pl.DataFrame(data)

# total sales by (customer, product)
result = (
    df.with_columns((pl.col("price") * pl.col("quantity")).alias("total_sales"))
    .group_by(["customer", "product"])
    .agg(pl.sum("total_sales").alias("total_sales"))
)

print(result)

#
python wester.py
shape: (3, 3)
┌──────────┬─────────┬─────────────┐
│ customer ┆ product ┆ total_sales │
│ ---      ┆ ---     ┆ ---         │
│ str      ┆ str     ┆ f64         │
╞══════════╪═════════╪═════════════╡
│ C039877  ┆ P0024   ┆ 380.85      │
│ C045298  ┆ P0167   ┆ 14.99       │
│ C060235  ┆ P0204   ┆ 34.95       │
└──────────┴─────────┴─────────────┘
df.group_by("customer", "product").agg(total=pl.col("quantity").dot("price"))

Expr.dot computes the sum of the products (i.e., dot product). There is also no need for a list (square brackets) in both group_by and agg

发布评论

评论列表(0)

  1. 暂无评论