Hello,
== Proposal - GROUPING SETS ==
a grouping set feature allows multiple grouping clauses in one
query. Result of grouping sets is union of results each groupby
clause's result.
create table t(a int, b int);
insert into t values(10,20);
insert into t values(30,40);
select a, b from t group by grouping sets(a, b);
is same as:
select a, NULL from t group by a
union all
select NULL, b from t group by b;
Note: all ungrouped vars are transformed to NULL
Groupby clause should contains cube and rollup lists.
These are transformed to grouping sets via transformed rules:
create table t1(a int, b int, c int)
group by rollup(a, b, c)
->
group by grouping sets((a,b,c), (a,b), (a), ())
group by cube(a,b,c)
->
group by grouping sets ((a,b,c),(a,b),(a,c), (a), (b,c), (b), (c), ())
Groupby clause or grouping sets should contains more sets.
Result is multiplication of these sets:
group by grouping sets(a), grouping sets(b, (b,c), ())
->
group by grouping sets((a,b), (a,b,c), (a))
When grouping sets are used, then we should to use grouping
and grouping_id functions. Function grouping returns 1 when
parameter is in current group set, else returns 0. Function
grouping_id returns value as
grouping_id(a,b,c) =
to_dec(to_bin(grouping(a) || grouping(b) || grouping(c)))
postgres=# select * from t;a | b | c
----+----+----10 | 20 | 3010 | 20 | 30
(2 rows)
postgres=# select a,b,c from t group by grouping sets(a,b,c);a | b | c
----+----+----10 | | | 20 | | | 30
(3 rows)
postgres=# select a,b,c, grouping(a), grouping(b), grouping(c), grouping_id(a,b,c) from t group by grouping
sets(a,b,c);a | b | c | grouping | grouping | grouping | grouping_id
----+----+----+----------+----------+----------+-------------10 | | | 1 | 0 | 0 |
4 | 20 | | 0 | 1 | 0 | 2 | | 30 | 0 | 0 | 1 | 1
(3 rows)
some real sample:
create table report( inserted date, locality varchar, name varchar, c int);
postgres=# copy report to stdout;
2008-10-10 Prague Milk 10
2008-10-11 Prague Milk 12
2008-10-10 Prague Rum 2
2008-10-11 Prague Rum 6
2008-10-10 Berlin Milk 8
2008-10-11 Berlin Milk 14
2008-10-10 Berlin Beer 20
2008-10-11 Berlin Beer 25
postgres=# select * from report; inserted | locality | name | c
------------+----------+------+----2008-10-10 | Prague | Milk | 102008-10-11 | Prague | Milk | 122008-10-10 |
Prague | Rum | 22008-10-11 | Prague | Rum | 62008-10-10 | Berlin | Milk | 82008-10-11 | Berlin | Milk |
142008-10-10| Berlin | Beer | 202008-10-11 | Berlin | Beer | 25
(8 rows)
postgres=# select inserted, locality, name, sum(c) from report group by grouping sets(inserted,
locality,name); inserted | locality | name | sum
------------+----------+------+-----2008-10-10 | | | 402008-10-11 | | | 57 |
Berlin | | 67 | Prague | | 30 | | Milk | 44 | | Rum |
8 | | Beer | 45
(7 rows)
postgres=# select inserted, locality, name, sum(c) from report group by grouping sets(inserted,
(locality,name)); inserted | locality | name | sum
------------+----------+------+-----2008-10-10 | | | 402008-10-11 | | | 57 |
Prague | Milk | 22 | Berlin | Milk | 22 | Berlin | Beer | 45 | Prague | Rum |
8
(6 rows)
postgres=# select inserted, locality, name, sum(c) from report group by name, grouping
sets(inserted,locality); inserted | locality | name | sum
------------+----------+------+-----2008-10-11 | | Rum | 62008-10-10 | | Rum | 22008-10-10 |
| Beer | 202008-10-11 | | Beer | 252008-10-10 | | Milk | 182008-10-11 | | Milk | 26
| Prague | Rum | 8 | Berlin | Beer | 45 | Berlin | Milk | 22 | Prague
| Milk | 22
(10 rows)
== Implementation ==
Grouping sets introduce a new concept into SQL. One readed
tuple is multiple used. It's similar with WITH clause. It's little
bit dificult implement it for current PostgreSQL's executor.
In my prototype I used aux node Feeder. This node should to hold
only one tuple. I add new method for Agg node, that process only
one input tuple:
for (;;){ tuple = feeder->execute(grouping_sets->lefttree); foreach(l, grouping_sets->subplans) {
lfirst(l)->process(tuple); } if (is_null(tuple)) break;}
It supports only HASH Agg nodes. For non hash agg should be used
other method (currently it isn't supported).
postgres=# explain verbose select inserted, locality, name, sum(c) from report
group by name,grouping sets(inserted, locality); QUERY PLAN
-----------------------------------------------------------------Grouping Sets (cost=12.00..35.00 rows=400 width=68)
Output:NULL::date, locality, name, sum(c) -> Seq Scan on report (cost=0.00..18.00 rows=800 width=68) Output:
inserted,locality, name, c -> HashAggregate (cost=6.00..8.50 rows=200 width=68) Output: inserted,
NULL::charactervarying, name, sum(c) -> Feeder (cost=0.00..0.00 rows=800 width=68) Output:
inserted,locality, name, c -> HashAggregate (cost=6.00..8.50 rows=200 width=68) Output: NULL::date, locality,
name,sum(c) -> Feeder (cost=0.00..0.00 rows=800 width=68) Output: inserted, locality, name, c
(12 rows)
After work on prototype I don't see any problems in executor or
parser. I expect some dificulties in planner. With grouping sets
grouping_planner procedure will be much more complex:
1. targetlist and groupclause will be list of list,
2. we should to repeat estimation of NumGroups for each group set (it's should be shared with CTE feature??).
== Parser problems ==
1. The identifier cube is used in contrib cube. Solution: CUBE '(' ... ')' generates funcCall and it is transformed
togrouping sets only in groupby clause later.
I invite any ideas, notes and help with documentation.
Regards
Pavel Stehule