Thread: proposal - GROUPING SETS

proposal - GROUPING SETS

From

"Pavel Stehule"

Date:

16 September 2008, 07:27:31

Hello,

== Proposal - GROUPING SETS ==

a grouping set feature allows multiple grouping clauses in one
query. Result of grouping sets is union of results each groupby
clause's result.

create table t(a int, b int);
insert into t values(10,20);
insert into t values(30,40);

select a, b from t group by grouping sets(a, b);

is same as:

select a, NULL from t group by a
union all
select NULL, b from t group by b;

Note: all ungrouped vars are transformed to NULL

Groupby clause should contains cube and rollup lists.
These are transformed to grouping sets via transformed rules:

create table t1(a int, b int, c int)

group by rollup(a, b, c)
->
group by grouping sets((a,b,c), (a,b), (a), ())

group by cube(a,b,c)
->
group by grouping sets ((a,b,c),(a,b),(a,c), (a), (b,c),                       (b), (c), ())

Groupby clause or grouping sets should contains more sets.
Result is multiplication of these sets:

group by grouping sets(a), grouping sets(b, (b,c), ())
->
group by grouping sets((a,b), (a,b,c), (a))

When grouping sets are used, then we should to use grouping
and grouping_id functions. Function grouping returns 1 when
parameter is in current group set, else returns 0. Function
grouping_id returns value as

grouping_id(a,b,c) =
to_dec(to_bin(grouping(a) || grouping(b) || grouping(c)))

postgres=# select * from t;a  | b  | c
----+----+----10 | 20 | 3010 | 20 | 30
(2 rows)

postgres=# select a,b,c from t group by grouping sets(a,b,c);a  | b  | c
----+----+----10 |    |   | 20 |   |    | 30
(3 rows)

postgres=# select a,b,c, grouping(a), grouping(b), grouping(c),          grouping_id(a,b,c) from t group by grouping
sets(a,b,c);a | b  | c  | grouping | grouping | grouping | grouping_id
 
----+----+----+----------+----------+----------+-------------10 |    |    |        1 |        0 |        0 |
4  | 20 |    |        0 |        1 |        0 |           2   |    | 30 |        0 |        0 |        1 |           1
 
(3 rows)

some real sample:

create table report( inserted date, locality varchar, name varchar, c int);

postgres=# copy report to stdout;
2008-10-10      Prague  Milk    10
2008-10-11      Prague  Milk    12
2008-10-10      Prague  Rum     2
2008-10-11      Prague  Rum     6
2008-10-10      Berlin  Milk    8
2008-10-11      Berlin  Milk    14
2008-10-10      Berlin  Beer    20
2008-10-11      Berlin  Beer    25

postgres=# select * from report; inserted  | locality | name | c
------------+----------+------+----2008-10-10 | Prague   | Milk | 102008-10-11 | Prague   | Milk | 122008-10-10 |
Prague  | Rum  |  22008-10-11 | Prague   | Rum  |  62008-10-10 | Berlin   | Milk |  82008-10-11 | Berlin   | Milk |
142008-10-10| Berlin   | Beer | 202008-10-11 | Berlin   | Beer | 25
 
(8 rows)

postgres=# select inserted, locality, name, sum(c)             from report            group by grouping sets(inserted,
locality,name); inserted  | locality | name | sum
 
------------+----------+------+-----2008-10-10 |          |      |  402008-10-11 |          |      |  57           |
Berlin  |      |  67           | Prague   |      |  30           |          | Milk |  44           |          | Rum  |
8           |          | Beer |  45
 
(7 rows)


postgres=# select inserted, locality, name, sum(c)             from report            group by grouping sets(inserted,
(locality,name)); inserted  | locality | name | sum
 
------------+----------+------+-----2008-10-10 |          |      |  402008-10-11 |          |      |  57           |
Prague  | Milk |  22           | Berlin   | Milk |  22           | Berlin   | Beer |  45           | Prague   | Rum  |
8
 
(6 rows)

postgres=# select inserted, locality, name, sum(c)             from report            group by name, grouping
sets(inserted,locality); inserted  | locality | name | sum
 
------------+----------+------+-----2008-10-11 |          | Rum  |   62008-10-10 |          | Rum  |   22008-10-10 |
     | Beer |  202008-10-11 |          | Beer |  252008-10-10 |          | Milk |  182008-10-11 |          | Milk |  26
         | Prague   | Rum  |   8           | Berlin   | Beer |  45           | Berlin   | Milk |  22           | Prague
 | Milk |  22
 
(10 rows)

== Implementation ==
Grouping sets introduce a new concept into SQL. One readed
tuple is multiple used. It's similar with WITH clause. It's little
bit dificult implement it for current PostgreSQL's executor.

In my prototype I used aux node Feeder. This node should to hold
only one tuple. I add new method for Agg node, that process only
one input tuple:
for (;;){    tuple = feeder->execute(grouping_sets->lefttree);    foreach(l, grouping_sets->subplans)    {
lfirst(l)->process(tuple);   }    if (is_null(tuple))            break;}
 

It supports only HASH Agg nodes. For non hash agg should be used
other method (currently it isn't supported).

postgres=# explain verbose select inserted, locality, name, sum(c)                             from report
             group by name,grouping sets(inserted, locality);                          QUERY PLAN
 
-----------------------------------------------------------------Grouping Sets  (cost=12.00..35.00 rows=400 width=68)
Output:NULL::date, locality, name, sum(c)  ->  Seq Scan on report  (cost=0.00..18.00 rows=800 width=68)        Output:
inserted,locality, name, c  ->  HashAggregate  (cost=6.00..8.50 rows=200 width=68)        Output: inserted,
NULL::charactervarying, name, sum(c)        ->  Feeder  (cost=0.00..0.00 rows=800 width=68)              Output:
inserted,locality, name, c  ->  HashAggregate  (cost=6.00..8.50 rows=200 width=68)        Output: NULL::date, locality,
name,sum(c)        ->  Feeder  (cost=0.00..0.00 rows=800 width=68)              Output: inserted, locality, name, c
 
(12 rows)

After work on prototype I don't see any problems in executor or
parser. I expect some dificulties in planner. With grouping sets
grouping_planner procedure will be much more complex:

1. targetlist and groupclause will be list of list,
2. we should to repeat estimation of NumGroups for each group  set (it's should be shared with CTE feature??).

== Parser problems ==
1. The identifier cube is used in contrib cube.  Solution: CUBE '(' ... ')' generates funcCall and it is transformed
togrouping sets only in groupby clause later.
 

I invite any ideas, notes and help with documentation.

Regards
Pavel Stehule

Re: proposal - GROUPING SETS

From

Tom Lane

Date:

16 September 2008, 11:02:54

"Pavel Stehule" <pavel.stehule@gmail.com> writes:
> select a, b from t group by grouping sets(a, b);

> is same as:

> select a, NULL from t group by a
> union all
> select NULL, b from t group by b;

Really?  That seems utterly bizarre, not to say pointless.
You sure you read the spec correctly?
        regards, tom lane

Re: proposal - GROUPING SETS

From

"Pavel Stehule"

Date:

16 September 2008, 11:23:00

2008/9/16 Tom Lane <tgl@sss.pgh.pa.us>:
> "Pavel Stehule" <pavel.stehule@gmail.com> writes:
>> select a, b from t group by grouping sets(a, b);
>
>> is same as:
>
>> select a, NULL from t group by a
>> union all
>> select NULL, b from t group by b;
>
> Really?  That seems utterly bizarre, not to say pointless.
> You sure you read the spec correctly?

If GSi is an <ordinary grouping set>, then
A) Transform SL2 to obtain SL3, and transform HC to obtain HC3, as follows:   For every PCk, if there is no j such that
PCk= GCRi,j, then make

the following   replacements in SL2 and HC:   I)      Replace each <grouping operation> in SL2 and HC that
contains a <column ref-           erence> that references PCk by the <literal> 1 (one).   II)     Replace each <column
reference>in SL2 and HC that references PCk by           CAST ( NULL AS DTPCk )

if I understend it well, there is little bit different rules than
usual GROUP BY clause.

Is there different way, how to merge more different results? NULL
value is only one possible solution.

Pavel

>
>                        regards, tom lane
>

Re: proposal - GROUPING SETS

From

"Greg Stark"

Date:

16 September 2008, 11:44:02

On Tue, Sep 16, 2008 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Pavel Stehule" <pavel.stehule@gmail.com> writes:
>> select a, b from t group by grouping sets(a, b);
>
>> is same as:
>
>> select a, NULL from t group by a
>> union all
>> select NULL, b from t group by b;
>
> Really?  That seems utterly bizarre, not to say pointless.
> You sure you read the spec correctly?

I think that's basically right but IIRC you need another set of
parentheses so it's GROUPING SETS ((a),(b))

Basically grouping sets are a generalized form of rollup and cube. If
you did GROUPING SETS ((a),(a,b),(a,b,c)) you would get the same as
ROLLUP. And if you listed every possible subset of the grouping
columns it would be the equivalent of CUBE. But it lets you specify an
arbitrary subset of the combinations that CUBE would return.

-- 
greg

Re: proposal - GROUPING SETS

From

"Pavel Stehule"

Date:

16 September 2008, 11:53:23

2008/9/16 Greg Stark <stark@enterprisedb.com>:
> On Tue, Sep 16, 2008 at 3:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> "Pavel Stehule" <pavel.stehule@gmail.com> writes:
>>> select a, b from t group by grouping sets(a, b);
>>
>>> is same as:
>>
>>> select a, NULL from t group by a
>>> union all
>>> select NULL, b from t group by b;
>>
>> Really?  That seems utterly bizarre, not to say pointless.
>> You sure you read the spec correctly?
>
> I think that's basically right but IIRC you need another set of
> parentheses so it's GROUPING SETS ((a),(b))

grouping sets ((a),(b)) is same as gs(a,b)

NOTE 165 — The result of the transform is to replace CL with a
<grouping sets specification> that contains a <grouping
set> for all possible subsets of the set of <ordinary grouping set>s
in the <ordinary grouping set list> of the <cube list>,
including <empty grouping set> as the empty subset with no <ordinary
grouping set>s.
For example, CUBE (A, B, C) is equivalent to:
GROUPING       SETS ( /* BSLi */  (A, B,      C),       /* 111 */  (A, B        ),       /* 110 */  (A,         C),
 /* 101 */  (A           ),       /* 100 */  (      B,   C),       /* 011 */  (      B     ),       /* 010 */  (
  C),       /* 001 */  (            ) 
)
As another example, CUBE ((A, B), (C, D)) is equivalent to:
GROUPING SETS ( /*            BSLi */  (A, B, C, D), /*           11 */  (A, B             ), /*    10 */  (
C,D), /*      01 */  (                 ) 
)

it's exactly defined in standard WD 9075-2:200w(E) 7.9 <group by
clause> page 354 Foundation (SQL/Foundation)

>
> Basically grouping sets are a generalized form of rollup and cube. If
> you did GROUPING SETS ((a),(a,b),(a,b,c)) you would get the same as
> ROLLUP. And if you listed every possible subset of the grouping
> columns it would be the equivalent of CUBE. But it lets you specify an
> arbitrary subset of the combinations that CUBE would return.
>
> --
> greg
>