sql - segmented monthly snapshots of validly eligible user counts

I've been trying to figure out a SQL (in postgresql) query for a cohort-type analysis at work and can't figure this one out for the life of me.

I need a snapshot count of the number of valid users at each month since each group's partnership began with our company (anizations.service_start), for each employer (eligibility.anization_id), and eligibility service type (eligibility.service_type - there may be more than one per employer), for every month since the employer joined into a partnership. the users' validity is determined by the user's initial eligibility start date (eligibility.start_date) and end date (eligibility.end_date), if said end_date is available and later than the start date.

The eligibility table is laid out as follows:

ID	START_DATE	END_DATE	SERVICE_TYPE	ORGANIZATION_ID	ORG_START_DATE
1234	01/13/2023		TYPE 1	GROUP A	2022-06-01
5678	02/21/2023	12/31/2024	TYPE 2	GROUP A	2022-09-01
1357	10/28/2022	12/31/2023	TYPE 1	GROUP B	2023-01-01
2468	10/28/2022	12/31/2023	TYPE 2	GROUP B	2023-08-01

I've been trying to figure out a SQL (in postgresql) query for a cohort-type analysis at work and can't figure this one out for the life of me.

The eligibility table is laid out as follows:

ID	START_DATE	END_DATE	SERVICE_TYPE	ORGANIZATION_ID	ORG_START_DATE
1234	01/13/2023		TYPE 1	GROUP A	2022-06-01
5678	02/21/2023	12/31/2024	TYPE 2	GROUP A	2022-09-01
1357	10/28/2022	12/31/2023	TYPE 1	GROUP B	2023-01-01
2468	10/28/2022	12/31/2023	TYPE 2	GROUP B	2023-08-01

but the end_date column may be inaccurately saved, either null (row 1), prior to start_date, or various other issues.

What I have so far is basically counts of the validly eligible users at the start of each group-type combination:

select 
    o.name as employer, e.service_type, 
    date_trunc('month', e.start_date::date) as start_month,
    sum(count(*)) over (partition by e.anization_id, e.health_plan_id, e.service_type, date_trunc('month', e.start_date::date)) as count
from eligibility e
left join anizations o on e.anization_id = o.id
where e.end_date is null or e.end_date > e.start_date
group by 2, 3, 1
order by 1, 2, 3;

meaning its basically one row per group-type combination and the count

EMPLOYER	SERVICE_TYPE	START_MONTH	COUNT
Group A	Type 1	2022-06-01	50
Group A	Type 2	2022-09-01	140
Group B	Type 1	2023-01-01	94
Group B	Type 2	2023-08-01	12

but i am wanting this for all months for each combination between the start_month and present day.

Share Improve this question edited Mar 2 at 8:07 Zegarek 27.4k5 gold badges24 silver badges30 bronze badges asked Mar 2 at 6:21 Eleanor Brock 11 bronze badge

What do you count()? The different health_plan_id over the same employer? May you post a minimal reproducible example, for example by fiddling on a small basis that you can modify with significant examples (adding some health_plan_id for example) until it works returning something showing your problem? Then paste the URL of that resulting fiddle, that first (running in the fiddle) query, its results, and the one you'd like, so that we can easily understand how to help you. – Guillaume Outters Commented Mar 2 at 8:26
Perhaps your example data should have such columns: health_plan_id. join anizations should be after grouping data in table eligibility, or take name as max(employer). – ValNik Commented Mar 2 at 8:28
1 Avoid the use of ordinals in GROUP BY and ORDER BY clauses: their use makes queries more fragile and increases cognitive load on readers. Names remove the dependence on column order in select lists and can convey more information to the reader (assuming that useful names are chosen instead of names such as col_1, col_2). – JohnH Commented Mar 2 at 19:55

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

While the whole process still seems quite unclear, it seems you will need generate_series() to instantiate every month of your "start to end" ranges.

Here is a query that returns "something" that may be close to your need (that we can only guess until you clarify it):

with
    -- Which "employer / service" tuples will we work on?
    -- When are their starts and stops?
    service_by_ as
    (
        select
            anization_id, service_type,
            min(_start_date) start_date,
            max(coalesce(end_date, current_timestamp)) end_date
        from eligibility
        group by 1, 2
    ),
    -- Now get the list of months for the full range of each tuple.
    months as
    (
        select
            anization_id, service_type,
            s.d start_date
        from service_by_ sbo,
        generate_series
        (
            date_trunc('month', start_date),
            end_date,
            interval '1 month'
        ) s(d)
    )
-- Count.
select o.name as employer, e.service_type, m.start_date as start_month, count(1) as count
from months m
join eligibility e
    on e.anization_id = m.anization_id
    and e.service_type = m.service_type
    and e.start_date < m.start_date + interval '1 month'
    and (e.end_date is null or e.end_date >= m.start_date)
    and e._start_date <= m.start_date -- Filter out entries with start_date < _start_date, for months before the _start_date.
join anizations o on o.id = e.anization_id
group by 1, 2, 3
order by 1, 2, 3
;

You can see it running live in a fiddle,
where I added another entry on GROUP A / Type 1, one year after the first one, to show a count greater than 1 (without it the example you gave only creates groups of 1 entry (from what I understood), which makes hard to demonstrate any interesting sum())

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sql - segmented monthly snapshots of validly eligible user counts - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)