最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - segmented monthly snapshots of validly eligible user counts - Stack Overflow

programmeradmin1浏览0评论

I've been trying to figure out a SQL (in postgresql) query for a cohort-type analysis at work and can't figure this one out for the life of me.

I need a snapshot count of the number of valid users at each month since each group's partnership began with our company (anizations.service_start), for each employer (eligibility.anization_id), and eligibility service type (eligibility.service_type - there may be more than one per employer), for every month since the employer joined into a partnership. the users' validity is determined by the user's initial eligibility start date (eligibility.start_date) and end date (eligibility.end_date), if said end_date is available and later than the start date.

The eligibility table is laid out as follows:

ID START_DATE END_DATE SERVICE_TYPE ORGANIZATION_ID ORG_START_DATE
1234 01/13/2023 TYPE 1 GROUP A 2022-06-01
5678 02/21/2023 12/31/2024 TYPE 2 GROUP A 2022-09-01
1357 10/28/2022 12/31/2023 TYPE 1 GROUP B 2023-01-01
2468 10/28/2022 12/31/2023 TYPE 2 GROUP B 2023-08-01

I've been trying to figure out a SQL (in postgresql) query for a cohort-type analysis at work and can't figure this one out for the life of me.

I need a snapshot count of the number of valid users at each month since each group's partnership began with our company (anizations.service_start), for each employer (eligibility.anization_id), and eligibility service type (eligibility.service_type - there may be more than one per employer), for every month since the employer joined into a partnership. the users' validity is determined by the user's initial eligibility start date (eligibility.start_date) and end date (eligibility.end_date), if said end_date is available and later than the start date.

The eligibility table is laid out as follows:

ID START_DATE END_DATE SERVICE_TYPE ORGANIZATION_ID ORG_START_DATE
1234 01/13/2023 TYPE 1 GROUP A 2022-06-01
5678 02/21/2023 12/31/2024 TYPE 2 GROUP A 2022-09-01
1357 10/28/2022 12/31/2023 TYPE 1 GROUP B 2023-01-01
2468 10/28/2022 12/31/2023 TYPE 2 GROUP B 2023-08-01

but the end_date column may be inaccurately saved, either null (row 1), prior to start_date, or various other issues.

What I have so far is basically counts of the validly eligible users at the start of each group-type combination:

select 
    o.name as employer, e.service_type, 
    date_trunc('month', e.start_date::date) as start_month,
    sum(count(*)) over (partition by e.anization_id, e.health_plan_id, e.service_type, date_trunc('month', e.start_date::date)) as count
from eligibility e
left join anizations o on e.anization_id = o.id
where e.end_date is null or e.end_date > e.start_date
group by 2, 3, 1
order by 1, 2, 3;

meaning its basically one row per group-type combination and the count

EMPLOYER SERVICE_TYPE START_MONTH COUNT
Group A Type 1 2022-06-01 50
Group A Type 2 2022-09-01 140
Group B Type 1 2023-01-01 94
Group B Type 2 2023-08-01 12

but i am wanting this for all months for each combination between the start_month and present day.

Share Improve this question edited Mar 2 at 8:07 Zegarek 27.4k5 gold badges24 silver badges30 bronze badges asked Mar 2 at 6:21 Eleanor BrockEleanor Brock 11 bronze badge 3
  • What do you count()? The different health_plan_id over the same employer? May you post a minimal reproducible example, for example by fiddling on a small basis that you can modify with significant examples (adding some health_plan_id for example) until it works returning something showing your problem? Then paste the URL of that resulting fiddle, that first (running in the fiddle) query, its results, and the one you'd like, so that we can easily understand how to help you. – Guillaume Outters Commented Mar 2 at 8:26
  • Perhaps your example data should have such columns: health_plan_id. join anizations should be after grouping data in table eligibility, or take name as max(employer). – ValNik Commented Mar 2 at 8:28
  • 1 Avoid the use of ordinals in GROUP BY and ORDER BY clauses: their use makes queries more fragile and increases cognitive load on readers. Names remove the dependence on column order in select lists and can convey more information to the reader (assuming that useful names are chosen instead of names such as col_1, col_2). – JohnH Commented Mar 2 at 19:55
Add a comment  | 

1 Answer 1

Reset to default 0

While the whole process still seems quite unclear, it seems you will need generate_series() to instantiate every month of your "start to end" ranges.

Here is a query that returns "something" that may be close to your need (that we can only guess until you clarify it):

with
    -- Which "employer / service" tuples will we work on?
    -- When are their starts and stops?
    service_by_ as
    (
        select
            anization_id, service_type,
            min(_start_date) start_date,
            max(coalesce(end_date, current_timestamp)) end_date
        from eligibility
        group by 1, 2
    ),
    -- Now get the list of months for the full range of each tuple.
    months as
    (
        select
            anization_id, service_type,
            s.d start_date
        from service_by_ sbo,
        generate_series
        (
            date_trunc('month', start_date),
            end_date,
            interval '1 month'
        ) s(d)
    )
-- Count.
select o.name as employer, e.service_type, m.start_date as start_month, count(1) as count
from months m
join eligibility e
    on e.anization_id = m.anization_id
    and e.service_type = m.service_type
    and e.start_date < m.start_date + interval '1 month'
    and (e.end_date is null or e.end_date >= m.start_date)
    and e._start_date <= m.start_date -- Filter out entries with start_date < _start_date, for months before the _start_date.
join anizations o on o.id = e.anization_id
group by 1, 2, 3
order by 1, 2, 3
;

You can see it running live in a fiddle,
where I added another entry on GROUP A / Type 1, one year after the first one, to show a count greater than 1 (without it the example you gave only creates groups of 1 entry (from what I understood), which makes hard to demonstrate any interesting sum())

发布评论

评论列表(0)

  1. 暂无评论