I'm assuming CTE is the most efficient way to do this, but any suggestions would be great. I think I can do this in multiple steps a different way if I have to, but would love to make this as efficient as possible since the table is huge. I have a table like this:
PatientID | DischargeDate | RxDate | DaysSupply |
---|---|---|---|
1 | 1/1/2025 | 1/2/2025 | 3 |
2 | 1/3/2025 | 1/3/2025 | 5 |
3 | 1/3/2025 | 2/4/2025 | 10 |
I'm assuming CTE is the most efficient way to do this, but any suggestions would be great. I think I can do this in multiple steps a different way if I have to, but would love to make this as efficient as possible since the table is huge. I have a table like this:
PatientID | DischargeDate | RxDate | DaysSupply |
---|---|---|---|
1 | 1/1/2025 | 1/2/2025 | 3 |
2 | 1/3/2025 | 1/3/2025 | 5 |
3 | 1/3/2025 | 2/4/2025 | 10 |
I need to produce a table which keeps one line per patient discharge and creates 30 new columns indicating whether or not the patient had an antibiotic that day, like this:
PatientID | DischargeDate | RxDate | DaysSupply | Day1 | Day2 | Day3 | Day4 and so on |
---|---|---|---|---|---|---|---|
1 | 1/1/2025 | 1/2/2025 | 3 | 1 | 1 | 1 | 0 |
However, for the purposes of this post, I'll certainly settle for a solution that produces the following and then I can transpose it. Note - I am only keeping the records in which a patient started a prescription within 1-30 days post discharge. So you'll notice I left out patient #3 below.
PatientID | DischargeDate | RxDate |
---|---|---|
1 | 1/1/2025 | 1/2/2025 |
1 | 1/1/2025 | 1/3/2025 |
1 | 1/1/2025 | 1/4/2025 |
2 | 1/3/2025 | 1/3/2025 |
2 | 1/3/2025 | 1/4/2025 |
2 | 1/3/2025 | 1/5/2025 |
2 | 1/3/2025 | 1/6/2025 |
2 | 1/3/2025 | 1/7/2025 |
Any suggestions are greatly appreciated. Thank you!
Share Improve this question edited yesterday bfbeck asked yesterday bfbeckbfbeck 254 bronze badges 5- I created a fiddle for us to collaborate on a solution. dbfiddle.uk/EhzohmA0 – Bart McEndree Commented yesterday
- Multiple options found here stackoverflow/questions/33327837/… – Bart McEndree Commented yesterday
- What RDBMS are you using? – Andrew Commented yesterday
- In the first option, what does Day1 represent? Is it the first day the patient started treatment, or the first day after discharge? If the latter, what about Day0? – Charlieface Commented yesterday
- Thanks all. Sorry I fot to put the type of SQL I was using. The generate_series function works really nice, seems to be the most efficient. – bfbeck Commented yesterday
3 Answers
Reset to default 1For the first option, the best way (if not very concise) is to just use a bunch of CASE
expressions.
SELECT
p.PatientID,
p.DischargeDate,
p.RxDate,
p.DaysSupply,
CASE WHEN p.DaysSupply >= 01 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day01,
CASE WHEN p.DaysSupply >= 02 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day02,
CASE WHEN p.DaysSupply >= 03 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day03,
CASE WHEN p.DaysSupply >= 04 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day04,
CASE WHEN p.DaysSupply >= 05 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day05,
CASE WHEN p.DaysSupply >= 06 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day06,
CASE WHEN p.DaysSupply >= 07 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day07,
CASE WHEN p.DaysSupply >= 08 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day08,
CASE WHEN p.DaysSupply >= 09 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day09,
CASE WHEN p.DaysSupply >= 10 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day10,
CASE WHEN p.DaysSupply >= 11 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day11,
CASE WHEN p.DaysSupply >= 12 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day12,
CASE WHEN p.DaysSupply >= 13 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day13,
CASE WHEN p.DaysSupply >= 14 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day14,
CASE WHEN p.DaysSupply >= 15 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day15,
CASE WHEN p.DaysSupply >= 16 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day16,
CASE WHEN p.DaysSupply >= 17 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day17,
CASE WHEN p.DaysSupply >= 18 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day18,
CASE WHEN p.DaysSupply >= 19 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day19,
CASE WHEN p.DaysSupply >= 20 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day20,
CASE WHEN p.DaysSupply >= 21 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day21,
CASE WHEN p.DaysSupply >= 22 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day22,
CASE WHEN p.DaysSupply >= 23 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day23,
CASE WHEN p.DaysSupply >= 24 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day24,
CASE WHEN p.DaysSupply >= 25 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day25,
CASE WHEN p.DaysSupply >= 26 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day26,
CASE WHEN p.DaysSupply >= 27 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day27,
CASE WHEN p.DaysSupply >= 28 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day28,
CASE WHEN p.DaysSupply >= 29 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day29,
CASE WHEN p.DaysSupply >= 30 THEN CAST(1 AS bit) ELSE CAST(0 AS bit) END AS Day30
FROM Patient p
WHERE DATEADD(day, 30, p.DischargeDate) > p.RxDate;
For the second option you can use GENERATE_SERIES
to dynamically generate more rows, and some date math.
SELECT
p.PatientID,
p.DischargeDate,
DATEADD(day, g.value, p.RxDate) AS RxDate
FROM Patient p
CROSS APPLY GENERATE_SERIES(0, p.DaysSupply - 1) g
WHERE DATEADD(day, 30, p.DischargeDate) > p.RxDate;
db<>fiddle
This is a recursive CTE that might solve your issue:
WITH
CTE (PatientID, DischargeDate, RxDate, DaysSupply, DayCount)
AS
(
-- In a recursive CTE, this part is called an [Anchor]
SELECT PatientID, DischargeDate, RxDate, DaysSupply, 1 as DayCount
FROM Patient
WHERE DateDiff(d,DischargeDate,RxDate) < 31 --DATEDIFF ( datepart , startdate , enddate )
UNION ALL
-- This is the recursive expression of the CTE
-- On the first "execution" it will query data in [Patient],
-- relative to the [Anchor] above.
-- This will produce a resultset r and it is JOINed to [Patient] p
SELECT p.PatientID, p.DischargeDate, p.RxDate, p.DaysSupply, r.DayCount+1
FROM Patient p
INNER JOIN CTE r
ON p.PatientID = r.PatientID
WHERE r.DayCount< p.DaysSupply
)
SELECT
PatientID, DischargeDate, RxDate, DaysSupply
--,DayCount
--, DateDiff(d,DischargeDate,RxDate) as DaysBetweenDischargeAndRx
FROM CTE
ORDER BY PatientID, DayCount
fiddle
Output with debug info:
PatientID | DischargeDate | RxDate | DaysSupply | DayCount | DaysBetweenDischargeAndRx |
---|---|---|---|---|---|
1 | 1/1/2025 | 1/2/2025 | 3 | 1 | 1 |
1 | 1/1/2025 | 1/2/2025 | 3 | 2 | 1 |
1 | 1/1/2025 | 1/2/2025 | 3 | 3 | 1 |
2 | 1/3/2025 | 1/3/2025 | 5 | 1 | 0 |
2 | 1/3/2025 | 1/3/2025 | 5 | 2 | 0 |
2 | 1/3/2025 | 1/3/2025 | 5 | 3 | 0 |
2 | 1/3/2025 | 1/3/2025 | 5 | 4 | 0 |
2 | 1/3/2025 | 1/3/2025 | 5 | 5 | 0 |
The following works in BigQuery (and possibly some other flavours of SQL), this outputs the vertical table, as per your supplied example.
2 things to be aware of:
- Without specifying the flavour of SQL, the syntax may not be appropriate for what you are using.
- Date formatting - I am using
YYYY-MM-DD
. You can specify your date format as appropriate.
If you output 30 fields most of that will be NULLs as they won't apply.
--recreate the example data
WITH
patients AS (
SELECT
1 AS PatientID,
CAST('2025-01-01' AS DATE) AS DischargeDate,
3 AS days_supply,
CAST('2025-01-02' AS DATE) AS RxDate
UNION ALL
SELECT
2 AS PatientID,
CAST('2025-01-03' AS DATE) AS DischargeDate,
5 AS days_supply,
CAST('2025-01-03' AS DATE) AS RxDate
UNION ALL
SELECT
3 AS PatientID,
CAST('2025-01-03' AS DATE) AS DischargeDate,
10 AS days_supply,
CAST('2025-02-04' AS DATE) AS RxDate
),
--only take data where prescription was issued <= 30 days after the discharge date
--Add 1 day to the DATE_DIFF as it doesn't include the date it starts from e.g. 2025-01-30 to 2025-01-01 is 29 days, not 30.
tmp AS (
SELECT
*
FROM
patients
WHERE
DATE_DIFF(RxDate, DischargeDate, DAY)+1 <=30
)
--generate the number of doses and increment the date for each dose day
select
PatientID,
DischargeDate,
RxDate,
DATE_ADD(RxDate-1, INTERVAL doses DAY) AS dose_date,
doses AS dose_number
FROM
tmp,
UNNEST(GENERATE_ARRAY(1,days_supply,1)) AS doses
Output:
PatientID | DischargeDate | RxDate | dose_date | dose_number |
---|---|---|---|---|
1 | 2025-01-01 | 2025-01-02 | 2025-01-02 | 1 |
1 | 2025-01-01 | 2025-01-02 | 2025-01-03 | 2 |
1 | 2025-01-01 | 2025-01-02 | 2025-01-04 | 3 |
2 | 2025-01-03 | 2025-01-03 | 2025-01-03 | 1 |
2 | 2025-01-03 | 2025-01-03 | 2025-01-04 | 2 |
2 | 2025-01-03 | 2025-01-03 | 2025-01-05 | 3 |
2 | 2025-01-03 | 2025-01-03 | 2025-01-06 | 4 |
2 | 2025-01-03 | 2025-01-03 | 2025-01-07 | 5 |