最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Return an ID with at least 1 specific status using Group By Having Clause - Stack Overflow

programmeradmin3浏览0评论

This is my sample data set in a table called "status": fiddle

create table status(id, status)as values
 (1,    'in')
,(1,    'out')
,(1,    'in')
,(1,    'in')
,(2,    'in')
,(2,    'in')
,(3,    'out')
,(3,    'out')
,(3,    'in')
,(3,    'out')
,(4,    'out')
,(4,    'out');

If I want to return ID with at least 1 "in" status, how should I write it? I started with this

SELECT id, COUNT(status) 
FROM status  
GROUP BY id
HAVING count(*) > 1

but it is returning the actual number of status per ID, including 4 that has zero in statuses.

id count
3 4
4 2
2 2
1 4

This is my sample data set in a table called "status": fiddle

create table status(id, status)as values
 (1,    'in')
,(1,    'out')
,(1,    'in')
,(1,    'in')
,(2,    'in')
,(2,    'in')
,(3,    'out')
,(3,    'out')
,(3,    'in')
,(3,    'out')
,(4,    'out')
,(4,    'out');

If I want to return ID with at least 1 "in" status, how should I write it? I started with this

SELECT id, COUNT(status) 
FROM status  
GROUP BY id
HAVING count(*) > 1

but it is returning the actual number of status per ID, including 4 that has zero in statuses.

id count
3 4
4 2
2 2
1 4
Share Improve this question edited Apr 2 at 9:24 Zegarek 27.2k5 gold badges24 silver badges30 bronze badges asked Apr 2 at 7:15 rickyProgrammerrickyProgrammer 1,1754 gold badges30 silver badges67 bronze badges 4
  • 1 Please add the expected result to your question. Likely, you are after ..HAVING COUNT(*) FILTER(WHERE status = 'in') > 0, but it's unclear which outcome excatly you want. – Jonas Metzler Commented Apr 2 at 7:34
  • Can you also specify the expected result, i.e. provide a complete minimal reproducible example. – jarlh Commented Apr 2 at 9:09
  • It might be a good idea to keep status as boolean if there's just the options, or an enum if there's more - in case it's not an enum already. If you have any indexes on the table, it would be helpful to see them, if you could edit them in. Do you have some stats about the input set (rowcount, rows per id, how many ins vs outs)? – Zegarek Commented Apr 2 at 11:07
  • How did you get on? You have a bunch of answers below... – Dale K Commented 2 days ago
Add a comment  | 

3 Answers 3

Reset to default 1

To return IDs that have at least one "in" status you may find using the distinct keyword suitable:

SELECT DISTINCT id
FROM status
WHERE status = 'in';

It's enough to ask if any of the statuses in a group is in, which you can do with bool_or():
demo at db<>fiddle

select id,count(*)
from status
group by id
having bool_or(status='in');

Logically, it's the same as asking if every(status<>'in') - if not, it means at least one is 'in'. While bool_and() has the nice and readable every() alias, bool_or() doesn't, because any() is reserved for an array/subquery operator. Which happens to be another way to phrase it:

select id,count(*)
from status
group by id
having 'in'=any(array_agg(status));

It's the same but runs faster by using an index-only scan, assuming you create the index:

create index on status(id,status);

Alternatively, you can use an exists() condition:

select id,count(*)
from status as a
where exists(select from status as b
             where a.id=b.id
             and b.status='in')
group by id;

On a 700k test sample with 50k ids and 20%-80% split between outs and ins, here's how they perform (in seconds, over 25 calls):

version avg min max stddev sum
having 'in'=any(array_agg(status)) 0.139506 0.133707 0.150881 0.005853 3.487649
having bool_or(status='in') 0.225468 0.20282 0.285397 0.02107 5.636694
having count(*)filter(where status='in')>0 0.229346 0.203776 0.298583 0.025039 5.733644
having sum(case when status='in' then 1 else 0 end)>0 0.234454 0.203856 0.43467 0.046771 5.861347
where exists(..) 0.614826 0.550297 1.136524 0.120762 15.37065

Given a different distribution(e.g. more rare 'in' status, more ids) some of those don't hold and most switch to using the index:
demo at db<>fiddle

query_name avg min max stddev sum
where exists() 0.111665 0.10433 0.146798 0.009681 2.233304
having bool_or(status='in') 0.142832 0.136964 0.15687 0.006704 2.856638
having count(*)filter(where status='in')>0 0.144093 0.133775 0.212441 0.017232 2.881863
having sum(case when status='in' then 1 else 0 end)>0 0.145145 0.138662 0.160853 0.007255 2.9029
having 'in'=any(array_agg(status)) 0.216814 0.204083 0.265939 0.016245 4.336287

While not necessarily the best way to formulate the having clause, the filter clause that @Jonas Metzler has suggested, might still come handy:

select id
     , count(*)
     , count(*)filter(where status='in') as ins
     , count(*)filter(where status='out') as outs
from status
group by id
having bool_or(status='in');
id count ins outs
3 4 1 3
2 2 2 0
1 4 3 1

It lets you collect the three different counts in a single query, in a single pass.


If the only reason you used count(*) was to establish you want at least one 'in' but you only actually need the distinct ids, not counts, a compact partial index can speed it up:

create index on status(id)where(status='in');

Now all these two queries do is read it out, with no need to search:

select id 
from status 
where status='in'
group by id;
select distinct id 
from status 
where status='in';

You can conditionally sum whether an Id has a Status of 'in' and then filter on it e.g.

create table Status (Id int, Status varchar(12));

insert into Status (Id, Status)
values
(1, 'in'),
(1, 'out'),
(1, 'in'),
(1, 'in'),
(2, 'in'),
(2, 'in'),
(3, 'out'),
(3, 'out'),
(3, 'in'),
(3, 'out');
SELECT Id, COUNT(Status)
FROM Status  
GROUP BY Id
HAVING SUM(CASE WHEN Status = 'in' THEN 1 ELSE 0 END) > 0
ORDER BY Id;
id count
1 4
2 2
3 4

fiddle

As noted by others (thanks @JonasMetzler) FILTER is another, potentially better, option here e.g.

SELECT Id, COUNT(Status)
FROM Status  
GROUP BY Id
HAVING COUNT(*) FILTER (WHERE Status = 'in') > 0
ORDER BY Id;
发布评论

评论列表(0)

  1. 暂无评论