Re: Final Patch for GROUPING SETS - unrecognized node type: 347 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Final Patch for GROUPING SETS - unrecognized node type: 347 |
Date | |
Msg-id | 540C5081.90205@fuzzy.cz Whole thread Raw |
In response to | Re: Final Patch for GROUPING SETS - unrecognized node type: 347 (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: Final Patch for GROUPING SETS - unrecognized node
type: 347
|
List | pgsql-hackers |
On 6.9.2014 23:34, Andrew Gierth wrote: >>>>>> "Tomas" == Tomas Vondra <tv@fuzzy.cz> writes: > > Tomas> I have significant doubts about the whole design, > Tomas> though. Especially the decision not to use HashAggregate, > > There is no "decision not to use HashAggregate". There is simply no > support for HashAggregate yet. > > Having it be able to work with GroupAggregate is essential, because > there are always cases where HashAggregate is simply not permitted > (e.g. when using distinct or sorted aggs; or unhashable types; or with > the current code, when the estimated memory usage exceeds work_mem). > HashAggregate may be a performance improvement, but it's something > that can be added afterwards rather than an essential part of the > feature. Ah, OK. I got confused by the "final patch" subject, and so the possibility of additional optimization somehow didn't occur to me. > Tomas> Now, the chaining only makes this worse, because it > Tomas> effectively forces a separate sort of the whole table for each > Tomas> grouping set. > > It's not one sort per grouping set, it's the minimal number of sorts > needed to express the result as a union of ROLLUP clauses. The planner > code will (I believe) always find the smallest number of sorts needed. You're probably right. Although when doing GROUP BY CUBE (a,b,c,a) I get one more ChainAggregate than with CUBE(a,b,c). and we seem to compute all the aggregates twice. Not sure if we need to address this though, because it's mostly user's fault. > Each aggregate node can process any number of grouping sets as long as > they represent a single rollup list (and therefore share a single sort > order). > > Yes, this is slower than using one hashagg. But it solves the general > problem in a way that does not interfere with future optimization. > > (HashAggregate can be added to the current implementation by first > adding executor support for hashagg with multiple grouping sets, then > in the planner, extracting as many hashable grouping sets as possible > from the list before looking for rollup lists. The chained aggregate > code can work just fine with a HashAggregate as the chain head. > > We have not actually tackled this, since I'm not going to waste any > time adding optimizations before the basic idea is accepted.) OK, understood. > > Tomas> What I envisioned when considering hacking on this a few > Tomas> months back, was extending the aggregate API with "merge > Tomas> state" function, > > That's not really on the cards for arbitrary non-trivial aggregate > functions. > > Yes, it can be done for simple ones, and if you want to use that as a > basis for adding optimizations that's fine. But a solution that ONLY > works in simple cases isn't sufficient, IMO. I believe it can be done for most aggregates, assuming you have access to the internal state somehow (not just the). Adding it for in-core aggregates would not be difficult, in most cases. But you're right we don't have this now at all. regards Tomas
pgsql-hackers by date: