Re: WIP Patch for GROUPING SETS phase 1 - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: WIP Patch for GROUPING SETS phase 1
Date
Msg-id CAOeZVif-GbQkjGT28tcqAD-C41o_exzcSTmHG7Ppeuhxp1KYMQ@mail.gmail.com
Whole thread Raw
In response to WIP Patch for GROUPING SETS phase 1  (Atri Sharma <atri.jiit@gmail.com>)
Responses Re: WIP Patch for GROUPING SETS phase 1
List pgsql-hackers



On Thu, Aug 14, 2014 at 12:07 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
This is phase 1 (of either 2 or 3) of implementation of the standard GROUPING SETS feature, done by Andrew Gierth and myself.
 
Unlike previous attempts at this feature, we make no attempt to do any serious work in the parser; we perform some minor syntactic simplifications described in the spec, such as removing excess parens, but the original query structure is preserved in views and so on.
 
So far, we have done most of the actual work in the executor, but further phases will concentrate on the planner. We have not yet tackled the hard problem of generating plans that require multiple passes over the same input data; see below regarding design issues.
 
What works so far:
 
 - all the standard syntax is accepted (but many combinations are not plannable yet)
 - while the spec only allows column references in GROUP BY, we continue to allow arbitrary expressions
 - grouping sets which can be computed in a single pass over sorted data (i.e. anything that can be reduced to simple columns plus one ROLLUP clause, regardless of how it was specified in the query), are implemented as part of the existing GroupAggregate executor node
 - all kinds of aggregate functions, including ordered set functions and user-defined aggregates, are supported in conjunction with grouping sets (no API changes, other than one caveat about fn_extra)
 - the GROUPING() operation defined in the spec is implemented, including support for multiple args, and supports arbitrary expressions as an extension to the spec
 
Changes/incompatibilities:
 
 - the big compatibility issue: CUBE and ROLLUP are now partially reserved (col_name_keyword), which breaks contrib/cube. A separate patch for contrib/ is attached that renames the cube type to "cube"; a new name really needs to be chosen.
 - GROUPING is now a fully reserved word, and SETS is an unreserved keyword
 - GROUP BY (a,b)  now means  GROUP BY a,b  (as required by spec). GROUP BY ROW(a,b) still has the old meaning.
 - GROUP BY ()  is now supported too.
 - fn_extra for aggregate calls is per-call-site and NOT per-transition-value - the same fn_extra will be used for interleaved calls to the transition function with different transition values. fn_extra, if used at all, should be used only for per-call-site info such as data types, as clarified in the 9.4beta changes to the ordered set function implementation.
 
Future work:
 
We envisage that handling of arbitrary grouping sets will be best done by having the planner generating an Append of multiple aggregation paths, presumably with some way of moving the original input path to a CTE. We have not really explored yet how hard this will be; suggestions are welcome.
 
In the executor, it is obviously possible to extend HashAggregate to handle arbitrary collections of grouping sets, but even if the memory usage issue were solved, this would leave the question of what to do with non-hashable data types, so it seems that the planner work probably can't be avoided.
 
A new name needs to be found for the "cube" data type.
 
At this point we are more interested in design review rather than necessarily committing this patch in its current state. However, committing it may make future work easier; we leave that question open.




Sorry, forgot to attach the patch for fixing cube in contrib, which breaks since we now reserve "cube" keyword. Please find attached the same.

Regards,

Atri
Attachment

pgsql-hackers by date:

Previous
From: Atri Sharma
Date:
Subject: WIP Patch for GROUPING SETS phase 1
Next
From: Euler Taveira
Date:
Subject: Re: how to implement selectivity injection in postgresql