I'm seeking help with including deeper recurrence then last row with recursive query. Source data:
CREATE TABLE products_su(country, intprd, "period", su)AS VALUES
('GL', 'Medicine', '2019-04-01'::date, 57)
,('GL', 'Medicine', '2019-05-01', 298)
,('GL', 'Medicine', '2019-06-01', 860)
,('GL', 'Medicine', '2019-07-01', 1649)
,('GL', 'Medicine', '2019-08-01', 2227)
,('GL', 'Medicine', '2019-09-01', 1914)
,('GL', 'Medicine', '2019-10-01', 1751)
,('GL', 'Medicine', '2019-11-01', 2007)
,('GL', 'Medicine', '2019-12-01', 2649)
,('GL', 'Medicine', '2020-01-01', 2452)
,('GL', 'Medicine', '2020-02-01', 2733)
,('GL', 'Medicine', '2020-03-01', 3185)
,('GL', 'Medicine', '2020-04-01', 1768)
,('GL', 'Medicine', '2020-05-01', 1779)
,('GL', 'Medicine', '2020-06-01', 3030)
,('GL', 'Medicine', '2020-07-01', 3133)
,('GL', 'Medicine', '2020-08-01', 3373)
,('GL', 'Medicine', '2020-09-01', 4953)
,('GL', 'Medicine', '2020-10-01', 4478)
,('GL', 'Medicine', '2020-11-01', 4471)
,('GL', 'Medicine', '2020-12-01', 5212);
My query gives correct results for first 12 periods, but afterwards calculation for repeated patients changes twice: once, just for 13th period and then again for 14th period onwards. My calculation for repeated patients is accurate for firsts 12 months, but then calculation has got include previous 12th and 13th month, so query needs to modified twice and unioned twice:
1-12 month -- my calculation union 13 month -- new calculation recurring to value 12 periods ago union 14 month onwards -- new calculation recurring to value 12 and 13 periods ago
I am unable to put together values for new calculations.
Exact logic is in the spreadsheet below the query.
This is the query: It's Redshift but here's a PostgreSQL fiddle
with recursive build as (
select country,intprd,period,su,tpe,rp,rep_pat,cur_rn,max_rn
from(select country, intprd, period, su
, su / 6 as tpe
, 0::float as rp
, 0::float as rep_pat
, row_number()over(partition by country order by period) as rn
, 2::int as cur_rn
, count(1)over() as max_rn
from(select country, intprd, period, su::float
, row_number()over(partition by country order by period) as rn
from products_su)_)_
where rn = 1
union all
select t.country, t.intprd, t.period, t.su
, t.su/6 as tpe, b.tpe as rp
, least((b.tpe - b.rep_pat), t.tpe) as rep_pat
, b.cur_rn + 1 as cur_rn
, b.max_rn
from build b
join(select country, intprd, period, su::float, tpe
, 0::float as rep_pat
, lag(period)over(partition by country order by period) prev_period
, row_number()over(partition by country order by period) as rn
from(select country, intprd, period, SU
, SU/6 as tpe
, row_number()over(partition by country order by period) as rn
from products_su)_)as t
on t.prev_period = b.period
and t.country = b.country
and t.intprd = b.intprd
where t.rn = b.cur_rn
and b.cur_rn <= b.max_rn
)
select country, intprd, period, su
, round(tpe, 1) as tpe
, round(rep_pat, 1) as rep_pat
, round((tpe - rep_pat), 1) as new_pat
, round(sum(rep_pat+new_pat)over(partition by country order by period rows unbounded preceding), 1) as peq
from build
order by period;
country | intprd | period | su | tpe | rep_pat | new_pat | peq |
---|---|---|---|---|---|---|---|
GL | Medicine | 2019-04-01 | 57 | 9.5 | 0.0 | 9.5 | 9.5 |
GL | Medicine | 2019-05-01 | 298 | 49.7 | 9.5 | 40.2 | 59.2 |
GL | Medicine | 2019-06-01 | 860 | 143.3 | 40.2 | 103.2 | 202.6 |
GL | Medicine | 2019-07-01 | 1649 | 274.8 | 103.2 | 171.7 | 477.4 |
GL | Medicine | 2019-08-01 | 2227 | 371.2 | 171.7 | 199.5 | 848.6 |
GL | Medicine | 2019-09-01 | 1914 | 319.0 | 199.5 | 119.5 | 1167.6 |
GL | Medicine | 2019-10-01 | 1751 | 291.8 | 119.5 | 172.3 | 1459.4 |
GL | Medicine | 2019-11-01 | 2007 | 334.5 | 172.3 | 162.2 | 1793.9 |
GL | Medicine | 2019-12-01 | 2649 | 441.5 | 162.2 | 279.3 | 2235.4 |
GL | Medicine | 2020-01-01 | 2452 | 408.7 | 279.3 | 129.3 | 2644.0 |
GL | Medicine | 2020-02-01 | 2733 | 455.5 | 129.3 | 326.2 | 3099.6 |
GL | Medicine | 2020-03-01 | 3185 | 530.8 | 326.2 | 204.7 | 3630.4 |
GL | Medicine | 2020-04-01 13th period |
1768 | 294.7 | 204.7 expected 214.2 |
90.0 | 3925.1 |
GL | Medicine | 2020-05-01 14th period |
1779 | 296.5 | 90.0 expected 130.2 |
206.5 | 4221.6 |
GL | Medicine | 2020-06-01 | 3030 | 505.0 | 206.5 expected 309.7 |
298.5 | 4726.6 |
GL | Medicine | 2020-07-01 | 3133 | 522.2 | 298.5 expected 470.2 |
223.7 | 5248.8 |
GL | Medicine | 2020-08-01 | 3373 | 562.2 | 223.7 expected 423.2 |
338.5 | 5811.0 |
GL | Medicine | 2020-09-01 | 4953 | 825.5 | 338.5 expected 458.0 |
487.0 | 6636.5 |
GL | Medicine | 2020-10-01 | 4478 | 746.3 | 487.0 expected 659.3 |
259.3 | 7382.8 |
GL | Medicine | 2020-11-01 | 4471 | 745.2 | 259.3 expected 421.5 |
485.8 | 8127.9 |
GL | Medicine | 2020-12-01 | 5212 | 868.7 | 485.8 expected 765.2 |
382.8 | 8996.5 |
I'm seeking help with including deeper recurrence then last row with recursive query. Source data:
CREATE TABLE products_su(country, intprd, "period", su)AS VALUES
('GL', 'Medicine', '2019-04-01'::date, 57)
,('GL', 'Medicine', '2019-05-01', 298)
,('GL', 'Medicine', '2019-06-01', 860)
,('GL', 'Medicine', '2019-07-01', 1649)
,('GL', 'Medicine', '2019-08-01', 2227)
,('GL', 'Medicine', '2019-09-01', 1914)
,('GL', 'Medicine', '2019-10-01', 1751)
,('GL', 'Medicine', '2019-11-01', 2007)
,('GL', 'Medicine', '2019-12-01', 2649)
,('GL', 'Medicine', '2020-01-01', 2452)
,('GL', 'Medicine', '2020-02-01', 2733)
,('GL', 'Medicine', '2020-03-01', 3185)
,('GL', 'Medicine', '2020-04-01', 1768)
,('GL', 'Medicine', '2020-05-01', 1779)
,('GL', 'Medicine', '2020-06-01', 3030)
,('GL', 'Medicine', '2020-07-01', 3133)
,('GL', 'Medicine', '2020-08-01', 3373)
,('GL', 'Medicine', '2020-09-01', 4953)
,('GL', 'Medicine', '2020-10-01', 4478)
,('GL', 'Medicine', '2020-11-01', 4471)
,('GL', 'Medicine', '2020-12-01', 5212);
My query gives correct results for first 12 periods, but afterwards calculation for repeated patients changes twice: once, just for 13th period and then again for 14th period onwards. My calculation for repeated patients is accurate for firsts 12 months, but then calculation has got include previous 12th and 13th month, so query needs to modified twice and unioned twice:
1-12 month -- my calculation union 13 month -- new calculation recurring to value 12 periods ago union 14 month onwards -- new calculation recurring to value 12 and 13 periods ago
I am unable to put together values for new calculations.
Exact logic is in the spreadsheet below the query.
This is the query: It's Redshift but here's a PostgreSQL fiddle
with recursive build as (
select country,intprd,period,su,tpe,rp,rep_pat,cur_rn,max_rn
from(select country, intprd, period, su
, su / 6 as tpe
, 0::float as rp
, 0::float as rep_pat
, row_number()over(partition by country order by period) as rn
, 2::int as cur_rn
, count(1)over() as max_rn
from(select country, intprd, period, su::float
, row_number()over(partition by country order by period) as rn
from products_su)_)_
where rn = 1
union all
select t.country, t.intprd, t.period, t.su
, t.su/6 as tpe, b.tpe as rp
, least((b.tpe - b.rep_pat), t.tpe) as rep_pat
, b.cur_rn + 1 as cur_rn
, b.max_rn
from build b
join(select country, intprd, period, su::float, tpe
, 0::float as rep_pat
, lag(period)over(partition by country order by period) prev_period
, row_number()over(partition by country order by period) as rn
from(select country, intprd, period, SU
, SU/6 as tpe
, row_number()over(partition by country order by period) as rn
from products_su)_)as t
on t.prev_period = b.period
and t.country = b.country
and t.intprd = b.intprd
where t.rn = b.cur_rn
and b.cur_rn <= b.max_rn
)
select country, intprd, period, su
, round(tpe, 1) as tpe
, round(rep_pat, 1) as rep_pat
, round((tpe - rep_pat), 1) as new_pat
, round(sum(rep_pat+new_pat)over(partition by country order by period rows unbounded preceding), 1) as peq
from build
order by period;
country | intprd | period | su | tpe | rep_pat | new_pat | peq |
---|---|---|---|---|---|---|---|
GL | Medicine | 2019-04-01 | 57 | 9.5 | 0.0 | 9.5 | 9.5 |
GL | Medicine | 2019-05-01 | 298 | 49.7 | 9.5 | 40.2 | 59.2 |
GL | Medicine | 2019-06-01 | 860 | 143.3 | 40.2 | 103.2 | 202.6 |
GL | Medicine | 2019-07-01 | 1649 | 274.8 | 103.2 | 171.7 | 477.4 |
GL | Medicine | 2019-08-01 | 2227 | 371.2 | 171.7 | 199.5 | 848.6 |
GL | Medicine | 2019-09-01 | 1914 | 319.0 | 199.5 | 119.5 | 1167.6 |
GL | Medicine | 2019-10-01 | 1751 | 291.8 | 119.5 | 172.3 | 1459.4 |
GL | Medicine | 2019-11-01 | 2007 | 334.5 | 172.3 | 162.2 | 1793.9 |
GL | Medicine | 2019-12-01 | 2649 | 441.5 | 162.2 | 279.3 | 2235.4 |
GL | Medicine | 2020-01-01 | 2452 | 408.7 | 279.3 | 129.3 | 2644.0 |
GL | Medicine | 2020-02-01 | 2733 | 455.5 | 129.3 | 326.2 | 3099.6 |
GL | Medicine | 2020-03-01 | 3185 | 530.8 | 326.2 | 204.7 | 3630.4 |
GL | Medicine | 2020-04-01 13th period |
1768 | 294.7 | 204.7 expected 214.2 |
90.0 | 3925.1 |
GL | Medicine | 2020-05-01 14th period |
1779 | 296.5 | 90.0 expected 130.2 |
206.5 | 4221.6 |
GL | Medicine | 2020-06-01 | 3030 | 505.0 | 206.5 expected 309.7 |
298.5 | 4726.6 |
GL | Medicine | 2020-07-01 | 3133 | 522.2 | 298.5 expected 470.2 |
223.7 | 5248.8 |
GL | Medicine | 2020-08-01 | 3373 | 562.2 | 223.7 expected 423.2 |
338.5 | 5811.0 |
GL | Medicine | 2020-09-01 | 4953 | 825.5 | 338.5 expected 458.0 |
487.0 | 6636.5 |
GL | Medicine | 2020-10-01 | 4478 | 746.3 | 487.0 expected 659.3 |
259.3 | 7382.8 |
GL | Medicine | 2020-11-01 | 4471 | 745.2 | 259.3 expected 421.5 |
485.8 | 8127.9 |
GL | Medicine | 2020-12-01 | 5212 | 868.7 | 485.8 expected 765.2 |
382.8 | 8996.5 |
Below are the requirements and results to compare with excel formulas:
• Dosage: 6 tablets in month 1,2,13 and 14 of the treatment. No drug in 3rd and 4th year.
• # of Total Patient Equals who have taken the drug in a particular month are calculated by dividing standard units (SU) sold in that month by monthly dosage (considered as 6 tablets (SU))
• New patients taking drug in a particular month are calculated by subtracting the repeated patients number from the above number for that month. Repeated patients for a month are equivalent to new patients for the previous month.
• We assume 100% compliance of new patients in taking the second dosage and hence are accounted for next 12 months (next step)
• Final Total Patient Equals are calculated by summing up new patients number for recent 12 months i.e. the present month (for which the TPE are calculated) and the previous 11 months.
• Total Patient Equals share is calculated by dividing Total Patient Equals calculated using above methodology' by 'Sum of Total Patient Equals of all MS products'
• repeated Patients: repeated patients corrected to include patients from the previous year as well. After 12 months patients must be considered coming from the immediate previous month and from the month 12 months in the past who are starting again for the next year
• TOTAL_TPE's: summing done for previous 14 months instead of 12 to account for repeated patients form the month immendiately beofre the start for the time window (considered for the starting month) and the month 2 months removed from the start fo the window (considered for the ending month of the time window)
https://docs.google/spreadsheets/d/17j96xLV4jcx9YNj8qN2LzfH4MJHOS1iw-1TSb21LyNI/edit?gid=0#gid=0
Calculation changes from O20 into P20 and then Q20.
I will appreciate all the help :)
Share Improve this question edited Mar 31 at 21:49 Kondjitsu asked Mar 30 at 16:36 KondjitsuKondjitsu 211 gold badge1 silver badge6 bronze badges 8- It's never a good idea to open a stranger's spreadsheet. Can you provide requirements as text or pasting it into the question? – Chris Maurer Commented Mar 30 at 16:49
- Definitely, however within the spreadsheet they're presented with excel formulas along with the expected results, but sure, description updated :) – Kondjitsu Commented Mar 30 at 17:00
- 1 @Kondjitsu Opening with a brief, simplified summary/explanation of what you're trying to do and the meaning behind your input data, would help clarify the intended logic behind your query, then find&fix the problem. Renaming the fields/columns in a bit more expressive and self-explanatory manner would also go a long way. Please also attach how your expected input would be different from what you're getting now. – Zegarek Commented Mar 30 at 18:11
- Correction: how your expected output is meant to be different from the one you got now. Although a word on how your real-life inputs differ from the sample might also be helpful. – Zegarek Commented Mar 30 at 18:27
- I see no difference between your expected output and the output of your fiddle. What need to clarify what you expect and how what you are getting diverges from this expectation. – Bill Weiner Commented Mar 30 at 20:35
2 Answers
Reset to default 1So you are looking for a rolling sum, except you have to remember 14 values instead of 1 because the value you read on month 0 should be reused on months 1, 12 and 13 (and then we can fet it, if the patient comes back he or she will be considered a new patient).
Storing those 14 values on 14 rows (per country) would be impractical (window functions in the recursive CTE? No way!),
on 14 columns of a accumulator row would be better,
but our best choice would be to materialize them in a PostgreSQL array that will be much more compact and efficient
(RedShift will probably require some adaptations).
We'll shift the array from month to month, making 14 month-old sales disappear (after having picked them to be subtracted from current month's new sales, of course).\
Finally, we'll do all computations in Sales Units, and transcribe it to individual patients only for display (less values to store during the iterations, less loss of precision).
with recursive
su as (select *, row_number() over (partition by country order by period) - 1 as rn from products_su),
roll as
(
select
country, rn, period,
su tsu, 0 rsu, su nsu,
array[su, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] accu
from su where rn = 0
union all
select * from
(
with newsu as
(
select su.*, accu, accu[1] + accu[12] + accu[13] rsu
from roll join su on su.country = roll.country and su.rn = roll.rn + 1
)
select
country, rn, period,
su, rsu, su - rsu,
(su - rsu)||accu[1:12]
from newsu
) newsu
),
raw as
(
select
*,
rsu / 6.0 rep_pat, nsu / 6.0 new_pat,
sum(nsu / 6.0) over (partition by country order by rn rows between 13 preceding and current row) peq
from roll
)
select
country, period, tsu su, tsu / 6.0 tpe,
rep_pat, new_pat,
peq
from raw
order by country, period;
country | period | su | tpe | rep_pat | new_pat | peq |
---|---|---|---|---|---|---|
GL | 2019-04-01 | 57 | 9.5 | 0.0 | 9.5 | 9.5 |
GL | 2019-05-01 | 298 | 49.7 | 9.5 | 40.2 | 49.7 |
GL | 2019-06-01 | 860 | 143.3 | 40.2 | 103.2 | 152.8 |
GL | 2019-07-01 | 1649 | 274.8 | 103.2 | 171.7 | 324.5 |
GL | 2019-08-01 | 2227 | 371.2 | 171.7 | 199.5 | 524.0 |
GL | 2019-09-01 | 1914 | 319.0 | 199.5 | 119.5 | 643.5 |
GL | 2019-10-01 | 1751 | 291.8 | 119.5 | 172.3 | 815.8 |
GL | 2019-11-01 | 2007 | 334.5 | 172.3 | 162.2 | 978.0 |
GL | 2019-12-01 | 2649 | 441.5 | 162.2 | 279.3 | 1257.3 |
GL | 2020-01-01 | 2452 | 408.7 | 279.3 | 129.3 | 1386.7 |
GL | 2020-02-01 | 2733 | 455.5 | 129.3 | 326.2 | 1712.8 |
GL | 2020-03-01 | 3185 | 530.8 | 326.2 | 204.7 | 1917.5 |
GL | 2020-04-01 | 1768 | 294.7 | 214.2 | 80.5 | 1998.0 |
GL | 2020-05-01 | 1779 | 296.5 | 130.2 | 166.3 | 2164.3 |
GL | 2020-06-01 | 3030 | 505.0 | 309.7 | 195.3 | 2350.2 |
GL | 2020-07-01 | 3133 | 522.2 | 470.2 | 52.0 | 2362.0 |
GL | 2020-08-01 | 3373 | 562.2 | 423.2 | 139.0 | 2397.8 |
GL | 2020-09-01 | 4953 | 825.5 | 458.0 | 367.5 | 2593.7 |
GL | 2020-10-01 | 4478 | 746.3 | 659.3 | 87.0 | 2481.2 |
GL | 2020-11-01 | 4471 | 745.2 | 421.5 | 323.7 | 2685.3 |
GL | 2020-12-01 | 5212 | 868.7 | 765.2 | 103.5 | 2616.5 |
(here you've got it in a fiddle)
I would fet the window functions and summarize your table the old-fashioned way:
Select CustID, OrderDate, Coalesce(Case When OrderType='bulk' Then orig_value End,0) as Bulkval
,Coalesce(Case When OrderType='individual' Then orig_value End,0) as Indval
, Coalesce(Case When OrderType='other' Then orig_value End,0) as Otherval
From Custtable
Group By CustID, OrderDate
Now you can do some simple select logic to pick up the first non-zero value:
Select Custody, OrderDate, Case When bulkval>0 Then bulkval When IndVal>0 Then IndVal Else Otherval End as Adj_value
From (result above)