最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Split a value based on the sum of all previous values - Stack Overflow

programmeradmin1浏览0评论

Given the following example data:

ACCOUNTID ENTRYDATE TRANSACTION TYPE AMOUNT
18 2025-07-02 08:00 PAYMENT CASH -2359.53
18 2024-19-12 08:04 SALE NUTS 133
18 2024-19-12 08:04 SALE NUTS 75
18 2024-19-12 07:36 SALE BOLTS 1085.71
18 2024-25-06 07:40 SALE BOLTS 1065.82
18 2024-14-01 15:30 PAYMENT CASH -1065.82
18 2023-26-12 13:07 SALE BOLTS 1065.82
18 2023-10-08 12:36 PAYMENT CASH -1017.91
18 2023-27-06 10:53 SALE BOLTS 1017.91
18 2023-07-02 10:44 PAYMENT CASH -1017.91
18 2022-28-12 09:54 SALE BOLTS 1017.91
18 2022-04-08 00:00 PAYMENT CASH -941.07
18 2022-23-06 06:45 SALE BOLTS 941.07

Given the following example data:

ACCOUNTID ENTRYDATE TRANSACTION TYPE AMOUNT
18 2025-07-02 08:00 PAYMENT CASH -2359.53
18 2024-19-12 08:04 SALE NUTS 133
18 2024-19-12 08:04 SALE NUTS 75
18 2024-19-12 07:36 SALE BOLTS 1085.71
18 2024-25-06 07:40 SALE BOLTS 1065.82
18 2024-14-01 15:30 PAYMENT CASH -1065.82
18 2023-26-12 13:07 SALE BOLTS 1065.82
18 2023-10-08 12:36 PAYMENT CASH -1017.91
18 2023-27-06 10:53 SALE BOLTS 1017.91
18 2023-07-02 10:44 PAYMENT CASH -1017.91
18 2022-28-12 09:54 SALE BOLTS 1017.91
18 2022-04-08 00:00 PAYMENT CASH -941.07
18 2022-23-06 06:45 SALE BOLTS 941.07

I need to add 2 columns which calculate the how much of each transaction should be applied to each category (BOLTs or NUTS). The SALE ROWs are easy enough. It is the payments that are giving me trouble. The results should look something like this:

ACCOUNTID ENTRYDATE TRANSACTION TYPE AMOUNT NUT_AMT BOLT_AMT
18 2025-07-02 08:00 PAYMENT CASH -2359.53 -208 -2151.53
18 2024-19-12 08:04 SALE NUTS 133 133 0
18 2024-19-12 08:04 SALE NUTS 75 75 0
18 2024-19-12 07:36 SALE BOLTS 1085.71 0 1085.71
18 2024-25-06 07:40 SALE BOLTS 1065.82 0 1065.82
18 2024-14-01 15:30 PAYMENT CASH -1065.82 0 -1065.82
18 2023-26-12 13:07 SALE BOLTS 1065.82 0 1065.82
18 2023-10-08 12:36 PAYMENT CASH -1017.91 0 -1017.91
18 2023-27-06 10:53 SALE BOLTS 1017.91 0 1017.91
18 2023-07-02 10:44 PAYMENT CASH -1017.91 0 -1017.91
18 2022-28-12 09:54 SALE BOLTS 1017.91 0 1017.91
18 2022-04-08 00:00 PAYMENT CASH -941.07 0 -941.07
18 2022-23-06 06:45 SALE BOLTS 941.07 0 -941.07

I have written a couple window functions which essentially perform a cumulative sum over each of the two new columns. But it is the payments I can't figure out. I would greatly appreciate some help.

Thanks in advance.

Share edited Feb 10 at 22:19 Dale K 27.4k15 gold badges58 silver badges83 bronze badges asked Feb 10 at 21:53 user944796user944796 113 bronze badges 4
  • How do you know which payment belongs to which sales? If we simply order the rows by date/time, we get a payment of -941.07 first, then a sale of 941.07, then a sale of 1017.91, then two payments of -1017.91, then a sale of 1065.82, ... This seems quite mixed up. Have you made an error with the date/times or is it just that I don't see the rule we need to apply here? – Thorsten Kettner Commented Feb 10 at 22:10
  • Be great to see the SQL you have which isn't quite working. – Dale K Commented Feb 10 at 22:20
  • have a look here: dbfiddle.uk/I39fePrf it uses the assumption that the "closest following sale" will be the bolt/nut category - and according to that assumption (and your example data) that all cash payments are "bolt". You need to define the correct rule/s to apply - and if you cannot do that then you will always get unexpected (and quite likely) incorrect answers. – Paul Maxwell Commented Feb 11 at 1:17
  • Thanks. I tried to give a simplified example, which might have not simplified as much. Imagine a company that sells bolts. You sell bolt, Customers pay for bolts. Then one day you start selling nuts. Customers continue ordering bolts. But then they start ordering Nuts as well. But they keep paying in the same manner. Before you knew that the payments were to be credited to Bolts. But now, you need to know what part of the larger payments go toward bolts, and what part goes to nuts. Based on what the balances owed for both Bolts and Nuts at the time of the payment. – user944796 Commented Feb 11 at 2:06
Add a comment  | 

2 Answers 2

Reset to default 0

Let's say that any payment should be distributed between all sales that occurred since the last time the balance passed 0;
and implement it methodically:

  1. use a windowed sum to follow the due balance
  2. determine the tipping points (when balance went down to 0 and started growing again)
  3. group together all sales and payments between two tipping points
  4. distribute the payment proportionally to the sales over that debt cycle

Note that:

  • /!\ This will not work in case of prepayments /!\
    due to 2., because a payment is only associated to past sales.
    However it might be possible to modify the query to work on prepayments, by artificially shifting payments: ordering by case when type in ('CASH') then entrydate + interval '1' month else entrydate end
  • a partial payment will be dispatched between future sales too
    due to 4., because the distributing group runs until the settling payment.
    Thus a "1. buy 15 of bolts - 2. pay 10 - 3. buy 5 of nuts - 4. pay the remaining 10"
    will distribute both payment 7.5 for A / 2.5 for B,
    instead of the (commercially correct) "2. partially pays for 1. (bolts), 4. the remains (5 of bolts, 5 of nuts)

(see it running on your data in a fiddle)

with
  -- DUE Total:
  -- follow each account's balance, where sales increase debt and payments decrease it
  -- (hopefully making it return to 0).
  duet as -- DUE Total
  (
    select
      row_number() over (partition by accountid order by entrydate) id, -- Within accountId
      t.*,
      sum(amount) over (partition by accountid order by entrydate) due
    from transac t
  ),
  -- Flag each transaction which starts a debt
  -- (due amount > 0 whereas last transaction had a due amount <= 0)
  debtstart as
  (
    select
      d.*,
      -- This could have been computed directly into duet (with a row between … 1 precedenting),
      -- but let's give it its own table for clarity.
      case when lag(due) over (partition by accountid order by entrydate) > 0 then 0 else 1 end ds
    from duet d
  ),
  -- Now for each transaction, get the last id when the debt balance was negative.
  -- This will slice our history, grouping together sales until a payment settles them all
  -- (by setting the balance to 0 again, starting a new cycle). 
  indebtsince as
  (
    select
      ds.*,
      max(case when ds = 1 then id else 0 end) over (partition by accountid order by entrydate) indebtentryid
    from debtstart ds
  ),
  -- Here we have two options, depending on the count of products we sell:
  -- - either we hardcode the products, by adding 1 column per product on our transactions table
  --   (simpler to start with, because we keep 1 row per transaction; but each entry to the catalog will require to duplicate our code for a new column)
  -- - or we work product by product
  --   (adaptative, the code will run for any number of references; but requires a new, by-product, table)
  -- We take this last path to be future-proof.
  -- Let's decline debt by product type, summing it over each debt cycle.
  byproduct as
  (
    select accountid, indebtentryid, type, sum(amount) amount
    from indebtsince
    where amount > 0
    group by accountid, indebtentryid, type
  ),
  byproductproportion as
  (
    select
      p.*,
      amount / sum(amount) over (partition by accountid, indebtentryid) proportion
    from byproduct p
  ),
  -- Apply the proportions of each debt cycle to all payments of that cycle.
  bypayment as
  (
    select pay.id, pp.type, pay.amount * pp.proportion amount
    from indebtsince pay join byproductproportion pp
      on pp.accountid = pay.accountid
      and pp.indebtentryid = pay.indebtentryid
    where pay.type in ('CASH')
    -- And aggregate the sales too, to benefit from pivot (next table) calculus.
    union all
    select id, type, amount
    from duet
    where type not in ('CASH')
  ),
  -- Pivot time! Only now do we specialize our columns, to get one line per payment (instead of one per product per payment).
  pivot as
  (
    select
      sum(case when type = 'NUTS'  then amount end) nut_amt,
      sum(case when type = 'BOLTS' then amount end) bolt_amt,
      id
    from bypayment
    group by id
  )
-- Grand total
select d.*, p.nut_amt, p.bolt_amt
from duet d join pivot p on p.id = d.id
order by entrydate desc;

You can use the MODEL clause:

select accountid, entrydate, transaction, type, amount, nut_amt, bolt_amt
from (
    select d.*, 
        row_number() over(partition by accountid order by entrydate) as rn
    from data d
)
model
    partition by ( accountid ) 
    dimension by ( rn )
    measures( amount as amount, 0 as nut_amt, 0 as bolt_amt, 0 as tot_nut, 0 as tot_bolt, 
         transaction as transaction, type as type, entrydate as entrydate )
    rules(
        tot_nut[any] = case type[cv()] when 'NUTS' then nvl(tot_nut[cv()-1],0) + amount[cv()] 
                        when 'CASH' then 0 else nvl(tot_nut[cv()-1],0) end,
        tot_bolt[any] = case type[cv()] when 'BOLTS' then nvl(tot_bolt[cv()-1],0) + amount[cv()] 
                        when 'CASH' then 0 else nvl(tot_bolt[cv()-1],0) end,
        nut_amt[any] = case type[cv()] when 'NUTS' then amount[cv()] 
            when 'CASH' then -tot_nut[cv()-1] else 0 end,
        bolt_amt[any] = case type[cv()] when 'BOLTS' then amount[cv()] 
            when 'CASH' then -tot_bolt[cv()-1] else 0 end
    )
order by entrydate desc
;
发布评论

评论列表(0)

  1. 暂无评论