Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id 20141231085845.GA2148306@tornado.leadboat.com
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Noah Misch <noah@leadboat.com>)
Responses Re: Final Patch for GROUPING SETS  (Atri Sharma <atri.jiit@gmail.com>)
Re: Final Patch for GROUPING SETS  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
On Tue, Dec 23, 2014 at 02:29:58AM -0500, Noah Misch wrote:
> On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
> > I still find the ChainAggregate approach too ugly at a system structural
> > level to accept, regardless of Noah's argument about number of I/O cycles
> > consumed.  We'll be paying for that in complexity and bugs into the
> > indefinite future, and I wonder if it isn't going to foreclose some other
> > "performance opportunities" as well.
> 
> Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
> intersection between those having maintainable design and those having optimal
> I/O usage, optimal memory usage, and optimal number of sorts.  Let's put more
> effort into finding it.  I'm hearing that the shared tuplestore is
> ChainAggregate's principal threat to system structure; is that right?

The underlying algorithm, if naively expressed in terms of our executor
concepts, would call ExecProcNode() on a SortState, then feed the resulting
slot to both a GroupAggregate and to another Sort.  That implies a non-tree
graph of executor nodes, which isn't going to fly anytime soon.  The CTE
approach bypasses that problem by eliminating cooperation between sorts,
instead reading 2N+1 copies of the source data for N sorts.  ChainAggregate is
a bit like a node having two parents, a Sort and a GroupAggregate.  However,
the graph edge between ChainAggregate and its GroupAggregate is a tuplestore
instead of the usual, synchronous ExecProcNode().

Suppose one node orchestrated all sorting and aggregation.  Call it a
MultiGroupAggregate for now.  It wouldn't harness Sort nodes, because it
performs aggregation between tuplesort_puttupleslot() calls.  Instead, it
would directly manage two Tuplesortstate, CUR and NEXT.  The node would have
an initial phase similar to ExecSort(), in which it drains the outer node to
populate the first CUR.  After that, it looks more like agg_retrieve_direct(),
except that CUR is the input source, and each tuple drawn is also put into
NEXT.  When done with one CUR, swap CUR with NEXT and reinitialize NEXT.  This
design does not add I/O consumption or require a nonstandard communication
channel between executor nodes.  Tom, Andrew, does that look satisfactory?

Thanks,
nm



pgsql-hackers by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: Maximum number of WAL files in the pg_xlog directory
Next
From: Atri Sharma
Date:
Subject: Re: Final Patch for GROUPING SETS