I have a customer dataset with their start date and enddate. Each customer can have different start date and end date. I want to join it to another table to get a value i.e. status at start date and end date for all customers and populate the first table with status at start date and status at end date.
Can you please help
Not able to find efficient way to do this:
Customers Start Date Enddate
A 01/01/2024 30/06/2024
B 01/01/2024 30/05/2024
C 05/05/2023 01/01/2024
D 28/02/2023 01/01/2024
E 07/06/2023 20/02/2024
Table 2:
Customers Date Status
A 01/01/2024 Active
B 01/01/2024 Active
C 01/01/2024 Inactive
D 01/01/2024 Inactive
E 01/01/2024 Active
A 05/05/2023 Active
B 05/05/2023 Active
C 05/05/2023 Inactive
D 05/05/2023 Inactive
E 05/05/2023 Active
A 28/02/2023 Active
B 28/02/2023 Active
C 28/02/2023 Active
D 28/02/2023 Active
E 28/02/2023 Active
This table has records for all customers and dates with status
Expected result:
Customers Start Date End date Status at start date Status at end date
A 01/01/2024 30/06/2024 Active Active
B 01/01/2024 30/05/2024 Active Active
C 05/05/2023 01/01/2024 Inactive Active
D 28/02/2023 01/01/2024 Active Inactive
E 07/06/2023 20/02/2024 Active Active
I have a customer dataset with their start date and enddate. Each customer can have different start date and end date. I want to join it to another table to get a value i.e. status at start date and end date for all customers and populate the first table with status at start date and status at end date.
Can you please help
Not able to find efficient way to do this:
Customers Start Date Enddate
A 01/01/2024 30/06/2024
B 01/01/2024 30/05/2024
C 05/05/2023 01/01/2024
D 28/02/2023 01/01/2024
E 07/06/2023 20/02/2024
Table 2:
Customers Date Status
A 01/01/2024 Active
B 01/01/2024 Active
C 01/01/2024 Inactive
D 01/01/2024 Inactive
E 01/01/2024 Active
A 05/05/2023 Active
B 05/05/2023 Active
C 05/05/2023 Inactive
D 05/05/2023 Inactive
E 05/05/2023 Active
A 28/02/2023 Active
B 28/02/2023 Active
C 28/02/2023 Active
D 28/02/2023 Active
E 28/02/2023 Active
This table has records for all customers and dates with status
Expected result:
Customers Start Date End date Status at start date Status at end date
A 01/01/2024 30/06/2024 Active Active
B 01/01/2024 30/05/2024 Active Active
C 05/05/2023 01/01/2024 Inactive Active
D 28/02/2023 01/01/2024 Active Inactive
E 07/06/2023 20/02/2024 Active Active
Share
Improve this question
edited Jan 30 at 18:34
jarlh
44.8k8 gold badges50 silver badges67 bronze badges
asked Jan 30 at 16:18
user29435106user29435106
11 bronze badge
2
- Please provide example data and expected results – Bart McEndree Commented Jan 30 at 16:19
- Instead of comment please edit your question with minimal reproducible example. Your sample input data and expected output helps readers to test and validate. – samhita Commented Jan 30 at 16:38
3 Answers
Reset to default 1Table data in markdown text format is more readable.
-First CTE start_status just finds the status at start_date
- end_status CTE finds the closest date and checks the last status since the end date does not always find a match.
Sample Query
-- find the status for start date
WITH start_status AS (
SELECT
c.customer_id,
c.start_date,
c.end_date,
t.status AS status_at_start_date
FROM
customers c
LEFT JOIN
table2 t ON c.customer_id = t.customer_id AND t.date = c.start_date
),
-- find the closest status for the end date
end_status AS (
SELECT
customer_id,
end_date,
status AS status_at_end_date
FROM (
SELECT
c.customer_id,
c.end_date,
t.status,
ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY t.date DESC) AS rn
FROM
customers c
LEFT JOIN
table2 t ON c.customer_id = t.customer_id AND t.date <= c.end_date
) subquery
WHERE rn = 1
)
SELECT
ss.customer_id AS Customers,
ss.start_date AS Start_Date,
ss.end_date AS End_Date,
COALESCE(ss.status_at_start_date, es.status_at_end_date) AS Status_at_start_date,
es.status_at_end_date AS Status_at_end_date
FROM
start_status ss
JOIN
end_status es ON ss.customer_id = es.customer_id;
Output
CUSTOMERS | START_DATE | END_DATE | STATUS_AT_START_DATE | STATUS_AT_END_DATE |
---|---|---|---|---|
A | 2024-01-01 | 2024-06-30 | Active | Active |
B | 2024-01-01 | 2024-05-30 | Active | Active |
C | 2023-05-05 | 2024-01-01 | Inactive | Inactive |
D | 2023-02-28 | 2024-01-01 | Active | Inactive |
E | 2023-06-07 | 2024-02-20 | Active | Active |
- Calculate (select) nearest 1 row before StartDate
(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.StartDate
order by t2.Date desc limit 1) status_at_Startdate
- Calculate (select) nearest 1 row before EndDate
(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.EndDate
order by t2.Date desc limit 1) status_at_Enddate
See example
select *
,(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.StartDate
order by t2.Date desc limit 1) status_at_Startdate
,(select Status from Table2 t2
where t2.Customers=t.Customers and t2.Date<=t.EndDate
order by t2.Date desc limit 1) status_at_Enddate
from Table1 t
customers | startdate | enddate | status_at_startdate | status_at_enddate |
---|---|---|---|---|
A | 2024-01-01 | 2024-06-30 | Active | Active |
B | 2024-01-01 | 2024-05-30 | Active | Active |
C | 2023-05-05 | 2024-01-01 | Inactive | Inactive |
D | 2023-02-28 | 2024-01-01 | Active | Inactive |
E | 2023-06-07 | 2024-02-20 | Active | Active |
Get the start dates and end dates separately and then join sub-tables together on the customer id:
SELECT t1.customer
, t1.start_date
, t2.end_date
, t1.status_at_start_date
, t2.status_at_end_date
FROM (
SELECT cs.customer
, c.start_date
, status as status_at_start_date
FROM customerstatus cs
JOIN customers c
ON cs.customer = c.customer
QUALIFY ROW_NUMBER() OVER(
PARTITION BY cs.customer
ORDER BY
CASE WHEN c.start_date = cs.date -- if match then prefer that
THEN '0000-01-01'
ELSE c.start_date -- if no match, just use the earliest date
END) = 1
) t1
JOIN (
SELECT cs.customer
, c.end_date
, status as status_at_end_date
FROM customerstatus cs
JOIN customers c
ON cs.customer = c.customer
QUALIFY ROW_NUMBER() OVER(
PARTITION BY cs.customer
ORDER BY
CASE WHEN c.end_date = cs.date
THEN '9999-12-31'
ELSE c.end_date
END DESC) = 1
) t2
ON t1.customer = t2.customer
ORDER BY t1.customer;
Output:
customer | start_date | end_date | status_at_start_date | status_at_end_date |
---|---|---|---|---|
A | 2024-01-01 | 2024-06-30 | Active | Active |
B | 2024-01-01 | 2024-05-30 | Active | Active |
C | 2023-05-05 | 2024-01-01 | Inactive | Inactive |
D | 2023-02-28 | 2024-01-01 | Active | Inactive |
E | 2023-06-07 | 2024-02-20 | Active | Active |