Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id 5544.1418326628@sss.pgh.pa.us
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: Final Patch for GROUPING SETS  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> I've not spent any real effort looking at gsp2.patch yet, but it
>  Tom> seems even worse off comment-wise: if there's any explanation in
>  Tom> there at all of what a "chained aggregate" is, I didn't find it.

> (Maybe "stacked" would have been a better term.)

> What that code does is produce plans that look like this:

>   GroupAggregate
>   -> Sort
>      -> ChainAggregate
>         -> Sort
>            -> ChainAggregate

> in much the same way that WindowAgg nodes are generated.

That seems pretty messy, especially given your further comments that these
plan nodes are interconnected and know about each other (though you failed
to say exactly how).  The claimed analogy to WindowAgg therefore seems
bogus since stacked WindowAggs are independent, AFAIR anyway.  I'm also
wondering about performance: doesn't this imply more rows passing through
some of the plan steps than really necessary?

Also, how would this extend to preferring hashed aggregation in some of
the grouping steps?

ISTM that maybe what we should do is take a cue from the SQL spec, which
defines these things in terms of UNION ALL of plain-GROUP-BY operations
reading from a common CTE.  Abstractly, that is, we'd have
Append  -> GroupAggregate    -> Sort      -> source data  -> GroupAggregate    -> Sort      -> source data  ->
GroupAggregate   -> Sort      -> source data  ...
 

(or some of the arms could be HashAgg without a sort).  Then the question
is what exactly the aggregates are reading from.  We could do worse than
make it a straight CTE, I suppose.

>  Tom> I'd also counsel you to find some other way to do it than
>  Tom> putting bool chain_head fields in Aggref nodes;

> There are no chain_head fields in Aggref nodes.

Oh, I mistook "struct Agg" for "struct Aggref".  (That's another pretty
poorly chosen struct name, though I suppose it's far too late to change
that choice.)  Still, interconnecting plan nodes that aren't adjacent in
the plan tree doesn't sound like a great idea to me.

>  Tom> I took a quick look at gsp-u.patch.  It seems like that approach
>  Tom> should work, with of course the caveat that using CUBE/ROLLUP as
>  Tom> function names in a GROUP BY list would be problematic.  I'm not
>  Tom> convinced by the commentary in ruleutils.c suggesting that extra
>  Tom> parentheses would help disambiguate: aren't extra parentheses
>  Tom> still going to contain grouping specs according to the standard?

> The extra parens do actually disambiguate because CUBE(x) and
> (CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside
> GROUPING SETS (...), it cannot appear inside a (...) list nested inside
> a GROUPING SETS list (or anywhere else).

Maybe, but this seems very fragile and non-future-proof.  I think
double-quoting or schema-qualifying such function names would be safer
when you think about the use-case of dumping views that may get loaded
into future Postgres versions.

> The question that needs deciding here is less whether the approach
> _could_ work but whether we _want_ it. The objection has been made
> that we are in effect introducing a new category of "unreserved almost
> everywhere" keyword, which I think has a point;

True, but I think that ship has already sailed.  We already have similar
behavior for PARTITION, RANGE, and ROWS (see the opt_existing_window_name
production), and I think PRECEDING, FOLLOWING, and UNBOUNDED are
effectively reserved-in-certain-very-specific-contexts as well.  And there
are similar behaviors in plpgsql's parser.

> on the other hand,
> reserving CUBE is a seriously painful prospect.

Precisely.  I think renaming or getting rid of contrib/cube would have
to be something done in a staged fashion over multiple release cycles.
Waiting several years to get GROUPING SETS doesn't seem appealing at all
compared to this alternative.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Commitfest problems
Next
From: Heikki Linnakangas
Date:
Subject: Re: 9.5 release scheduling (was Re: logical column ordering)