Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id 87tx13t9y6.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Final Patch for GROUPING SETS  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

More comment on this later, but I want to highlight these specific
points since we need clear answers here to avoid wasting unnecessary
time and effort:
Tom> I've not spent any real effort looking at gsp2.patch yet, but itTom> seems even worse off comment-wise: if there's
anyexplanation inTom> there at all of what a "chained aggregate" is, I didn't find it.
 

(Maybe "stacked" would have been a better term.)

What that code does is produce plans that look like this:
 GroupAggregate -> Sort    -> ChainAggregate       -> Sort          -> ChainAggregate

in much the same way that WindowAgg nodes are generated.

Where would you consider the best place to comment this? The WindowAgg
equivalent seems to be discussed primarily in the header comment of
nodeWindowAgg.c.
Tom> I'd also counsel you to find some other way to do it thanTom> putting bool chain_head fields in Aggref nodes;

There are no chain_head fields in Aggref nodes.

Agg.chain_head is true for the Agg node at the top of the chain (the
GroupAggregate node in the above example), while AggState.chain_head
is set on the ChainAggregate nodes to point to the AggState of the
GroupAggregate node.

What we need to know before doing any further work on this is whether
this idea of stacking up aggregate and sort nodes is a viable one.

(The feedback I've had so far suggests that the performance is
acceptable, even if there are still optimization opportunities that
can be tackled later, like adding HashAggregate support.)
Tom> I took a quick look at gsp-u.patch.  It seems like that approachTom> should work, with of course the caveat that
usingCUBE/ROLLUP asTom> function names in a GROUP BY list would be problematic.  I'm notTom> convinced by the
commentaryin ruleutils.c suggesting that extraTom> parentheses would help disambiguate: aren't extra parenthesesTom>
stillgoing to contain grouping specs according to the standard?
 

The spec is of minimal help here since it does not allow expressions in
GROUP BY at all, last I looked; only column references.

The extra parens do actually disambiguate because CUBE(x) and
(CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside
GROUPING SETS (...), it cannot appear inside a (...) list nested inside
a GROUPING SETS list (or anywhere else).

As the comments in gram.y explain, the productions used are intended
to follow the spec with the exception of using a_expr where the spec
requires <ordinary grouping set>. So CUBE and ROLLUP are recognized as
special only as part of a group_by_item (<grouping element> in the
spec), and as soon as we see a paren that isn't part of the "GROUPING
SETS (" opener, we're forced into parsing an a_expr, in which CUBE()
would become a function call.

(The case of upgrading from an old pg version seems to require the use
of --quote-all-identifiers in pg_dump)
Tom> Forcibly schema-qualifying such function names seems like a lessTom> fragile answer on that end.

That I guess would require keeping more state, unless you applied it
everywhere to any function with a keyword for a name? I dunno.

The question that needs deciding here is less whether the approach
_could_ work but whether we _want_ it. The objection has been made
that we are in effect introducing a new category of "unreserved almost
everywhere" keyword, which I think has a point; on the other hand,
reserving CUBE is a seriously painful prospect.

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Fix typo um vacuumdb tests
Next
From: Mark Dilger
Date:
Subject: WRITE_UINT_FIELD used where WRITE_OID_FIELD likely intended