Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id CAOeZVidbB5v0FPG2=_QcROGvYJe+41LMiJ1Xy_ptqFZjNnp0Hw@mail.gmail.com
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Noah Misch <noah@leadboat.com>)
Responses Re: Final Patch for GROUPING SETS  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers


   ChainAggregate is
a bit like a node having two parents, a Sort and a GroupAggregate.  However,
the graph edge between ChainAggregate and its GroupAggregate is a tuplestore
instead of the usual, synchronous ExecProcNode().

Well, I dont buy the two parents theory. The Sort nodes are intermediately stacked amongst ChainAggregate nodes, so there is still the single edge. However, as you rightly said, there is a shared tuplestore, but note that only the head of chain ChainAggregate has the top GroupAggregate as its parent.

Suppose one node orchestrated all sorting and aggregation.  Call it a
MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
would directly manage two Tuplesortstate, CUR and NEXT.  The node would have
an initial phase similar to ExecSort(), in which it drains the outer node to
populate the first CUR.  After that, it looks more like agg_retrieve_direct(),
except that CUR is the input source, and each tuple drawn is also put into
NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.  This
design does not add I/O consumption or require a nonstandard communication
channel between executor nodes.  Tom, Andrew, does that look satisfactory?


So you are essentially proposing merging ChainAggregate and its corresponding Sort node?

So the structure would be something like:

GroupAggregate
--> MultiGroupAgg (a,b)
----> MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level GroupAggregate node projects the result of the entire operation. The key to ChainAggregate nodes is that each ChainAggregate node handles grouping sets that fit a single ROLLUP list i.e. can be done by a single sort order. There can be multiple lists of this type in a single GS operation, however, our current design has only a single top GroupAggregate node but a ChainAggregate node + Sort node per sort order. If you are proposing replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack with a single MultiGroupAggregate node, I am not able to understand how it will handle all the multiple sort orders. If you are proposing replacing only ChainAggregate + Sort node with a single MultiGroupAgg node, that still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please correct me if I am wrong.

Regards,

Atri


--
Regards,
 
Atri
l'apprenant

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Final Patch for GROUPING SETS
Next
From: Michael Paquier
Date:
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)