sql - Redshift GROUP BY ALL syntax not supported

Moving to Redshift from another provider, I see that GROUP BY ALL (which groups by all non-aggregate columns in the SELECT statement) syntax is not supported. This was super useful for me when I had loads of non-aggregate columns and didn't want to manually type out GROUP BY 1,2,3,4... etc.

I was wondering if there is in fact a similar command on Redshift or is it simply a limitation of PostgreSQL?

Share Improve this question asked Feb 6 at 11:08 Benjamin McDowell 3082 silver badges12 bronze badges

I strongly suggest using identifiers instead of ordinals to define grouping. Identifiers impose less cognitive load on the reader and are immune to changes to the order of the select list. – JohnH Commented Feb 6 at 21:27

Add a comment |

2 Answers 2

Sorted by: Reset to default 3

There is a group by all in both Redshift and Postgres but that's not what it does. You might be interested in group by cube, rollup or grouping sets. Postgres has them, Redshift has them too.

Unfortunately, there's no syntax for grouping by everything except aggregate expressions detected in the select list. There kind of is, but that's not entirely equivalent, and not as handy - see example at the end.

The group by all is default behaviour in both PostgreSQL and Redshift, the alternative being group by distinct which removes duplicate grouping sets. Quoting the doc:

When specifying multiple grouping items together, the final set of grouping sets might contain duplicates. For example:
GROUP BY ROLLUP (a, b), ROLLUP (a, c)
is equivalent to
GROUP BY GROUPING SETS (
   (a, b, c),
   (a, b),
   (a, b),
   (a, c),
   (a),
   (a),
   (a, c),
   (a),
   ()
)
If these duplicates are undesirable, they can be removed using the DISTINCT clause directly on the GROUP BY. Therefore:
GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c)
is equivalent to
GROUP BY GROUPING SETS (
   (a, b, c),
   (a, b),
   (a, c),
   (a),
   ()
)

It's similar to how select is actually select all unless you make it select distinct. Again, the doc:

If SELECT DISTINCT is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates). SELECT ALL specifies the opposite: all rows are kept; that is the default.

If there's a join, you can use a group by tablename.* to make all columns of the table your grouping set but

You still need to list them all in the join conditions. The exception is if all their names match and you're willing to risk a natural join.
You can't display them next to the aggregate expressions because they form a single grouping set. You can use grouping() function, but it'll just show that all rows belong to the 0th grouping set.

_{demo1 at db-fiddle}

select sum(test_vals.v) 
from test
natural join test_vals
group by all test.*;

sum
774.07
2114.39
806.75
966.23
1644.51

The scenario I think @Lajos Arpad is referring to is this:
_{demo2 at db-fiddle}

select test.*,sum(v)
from test
join test_vals using(a)
group by test.a;

Error: column "test.b" must appear in the GROUP BY clause or be used in an aggregate function

Suggested addition of b to the group by list makes that go away only to be replaced by a request to do something about c and all following columns. That's no longer the case if a is a primary key - you can group by only the PK, skipping the rest:

alter table test add primary key(a);

select test.*,sum(v)
from test
join test_vals using(a)
group by test.a;

a	b	c	sum
1	7	3	4.42
2	8	1	3.50
3	1	2	5.18

You can find this, once more, in the doc:

If the products table is set up so that, say, product_id is the primary key, then it would be enough to group by product_id in the above example, since name and price would be functionally dependent on the product ID, and so there would be no ambiguity about which name and price value to return for each product ID group.

There is no such shortcut in Postgres. But there are cases when it's unnecessary. If you have a primary key and each nonaggregated field is having a single value for each such primary key, then you can get away by simply grouping by the primary key at hand.

If your situation is different, then, depending on the tool you use you may generate your group by clause.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

sql - Redshift GROUP BY ALL syntax not supported - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)