最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Aggregate status based on presence of at least one match - Stack Overflow

programmeradmin3浏览0评论

Here's my Postgres table schema: db<>fiddle

create table my_table
 (id, name,     status, realm_id)as values
 (1,  'cash',   'denied',    123)
,(2,  'check',  'closed',    123)
,(3,  'payroll','denied',    123)
,(4,  'cash',   'pending',   456)
,(5,  'deposit','suspended', 456)
,(6,  'lending','expired',   456)
,(7,  'loan',   'trial',     456)
,(8,  'crypto', 'active',    456)
,(9,  'payroll','closed',    456);

The result that I'd like to get is something like this:

realm_id status
123 inactive
456 active

Here's my Postgres table schema: db<>fiddle

create table my_table
 (id, name,     status, realm_id)as values
 (1,  'cash',   'denied',    123)
,(2,  'check',  'closed',    123)
,(3,  'payroll','denied',    123)
,(4,  'cash',   'pending',   456)
,(5,  'deposit','suspended', 456)
,(6,  'lending','expired',   456)
,(7,  'loan',   'trial',     456)
,(8,  'crypto', 'active',    456)
,(9,  'payroll','closed',    456);

The result that I'd like to get is something like this:

realm_id status
123 inactive
456 active

So two dimensions of aggregation:

  1. aggregate based on realm_id first;
  2. aggregate based on status: as long as the realm_id has a name which status is neither closed nor denied, it'll be marked as active, otherwise, it's inactive.

I've tried to use aggregate and left outer join, but no luck thus far.

Any ideas would be greatly appreciated!

Share Improve this question edited Mar 30 at 15:44 Zegarek 27.2k5 gold badges24 silver badges30 bronze badges asked Mar 29 at 16:06 Fisher CoderFisher Coder 3,60814 gold badges55 silver badges95 bronze badges 2
  • 9 | payroll | closed | 456 - so why does this appear as active in your result given the rule status closed = inactive? – P.Salmon Commented Mar 29 at 16:15
  • 1 @P.Salmon Point 2 of their requirements is there must be at least one row per realm_id with a value not in ('closed','denied'), then the status should be active for this realm_id. That's the case for realm_id 456, there are even 5 such rows. – Jonas Metzler Commented Mar 29 at 16:21
Add a comment  | 

3 Answers 3

Reset to default 7

You can use EVERY combined with a GROUP BY clause to check if every row per realm_id has status closed or denied. Using a CASE expression, you can then set the status to active or inactive.

SELECT
  realm_id,
  CASE 
    WHEN EVERY(status in ('closed', 'denied')) 
      THEN 'inactive'
      ELSE 'active' END AS status
FROM yourtable
GROUP BY realm_id
ORDER BY realm_id;

I would even prefer to skip the CASE expression and simply return t or f in a column named inactive, that's a matter of taste:

SELECT
  realm_id,
  EVERY(status in ('closed', 'denied')) AS inactive
FROM yourtable
GROUP BY realm_id
ORDER BY realm_id;

See this db<>fiddle with your sample data.

You can use bool_or() aggregation function.

The BOOL_OR() is an aggregate function that allows you to aggregate boolean values across rows within a group.

The BOOL_OR() function returns true if at least one value in the group is true. If all values are false, the function returns false.

Query:

SELECT realm_id,
CASE WHEN bool_or(status NOT IN ('closed', 'denied')) 
     THEN 'active' ELSE 'inactive' 
END AS status
FROM tableName
GROUP BY realm_id
order by realm_id

Output:

realm_id status
123 inactive
456 active

[fiddle](https://dbfiddle.uk/gkfZmwlr)

For speed, add an expression index:

create index on my_table(realm_id asc,(status not in('closed','denied')) desc);

This lets you use an index skip scan:

with recursive cte as(
 (select realm_id
       , exists(select from my_table as b
                where status not in('closed','denied')
                and a.realm_id=b.realm_id) as is_active
  from my_table as a
  order by 1 limit 1
 )union all
  select s.*
  from cte cross join lateral
  (select c.realm_id
        , exists(select from my_table as d
                 where status not in('closed','denied')
                 and c.realm_id=d.realm_id) as is_active
   from my_table as c
   where c.realm_id>cte.realm_id
   order by 1 limit 1) as s
  where cte.realm_id is not null)
select*from cte 
where realm_id is not null;

Simple aggregation with every/bool_and/bool_or is nice, short and clear:

select realm_id
     , not bool_and(status=any('{closed,denied}')) as is_active
from my_table
group by 1;

Unfortunately, it can't use an index which is why on 2M rows it takes 575ms on average, having to run a sequential scan. The skip scan runs 56x faster, under 11ms using index-onlies:
demo at db<>fiddle

query_name avg min max stddev
index skip scan 00:00:00.010736 00:00:00.010207 00:00:00.014184 00:00:00.000849
bool_and, every, bool_or 00:00:00.575871 00:00:00.564045 00:00:00.647492 00:00:00.014413

What would be even faster is having that information collected ahead of time. You can use a tally table to track which realm_id is active, keep count of records in it and whatever other aggregate info you regularly need to check. It requires a simple trigger to maintain it by running an increment/decrement, boolean flip, append/pop after each DML on the table, but the information is then always readily available with no need to re-calculate.

发布评论

评论列表(0)

  1. 暂无评论