Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Final Patch for GROUPING SETS
Date
Msg-id 87fvcks1og.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Final Patch for GROUPING SETS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Final Patch for GROUPING SETS  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> I'd already explained in more detail way back when we posted the>> patch. But to reiterate: the ChainAggregate nodes
passthrough>> their input data unchanged, but on group boundaries they write>> aggregated result rows to a tuplestore
sharedby the whole>> chain. The top node returns the data from the tuplestore after its>> own output is completed.
 
Tom> That seems pretty grotty from a performance+memory consumptionTom> standpoint.  At peak memory usage, each one of
theSort nodesTom> will contain every input row,
 

Has this objection ever been raised for WindowAgg, which has the same
issue?
Tom> and the shared tuplestore will contain every output row.

Every output row except those produced by the top node, and since this
is after grouping, that's expected to be smaller than the input.
Tom> That will lead to either a lot of memory eaten, or a lot ofTom> temp-file I/O, depending on how big work_mem is.

Yes. Though note that this code only kicks in when dealing with
grouping sets more complex than a simple rollup. A CUBE of two
dimensions uses only one Sort node above whatever is needed to produce
sorted input, and a CUBE of three dimensions uses only two. (It does
increase quite a lot for large cubes though.)
Tom> In principle, with the CTE+UNION approach I was suggesting, theTom> peak memory consumption would be one copy of
theinput rows inTom> the CTE's tuplestore plus one copy in the active branch's SortTom> node.  I think a bit of effort
wouldbe needed to get there (ie,Tom> shut down one branch's Sort node before starting the next,Tom> something I'm
prettysure doesn't happen today).
 

Correct, it doesn't.

However, I notice that having ChainAggregate shut down its input would
also have the effect of bounding the memory usage (to two copies,
which is as good as the append+sorts+CTE case).

Is shutting down and reinitializing parts of the plan really feasible
here? Or would it be a case of forcing a rescan?
>> Second was to generate a CTE for the input data. This didn't get>> very far because everything that already exists
tohandle CTE>> nodes assumes that they are explicit in the planner's input (that>> they have their own Query node,
etc.)and I was not able to>> determine a good solution.
 
Tom> Seems like restructuring that wouldn't be *that* hard.  WeTom> probably don't want it to be completely like a CTE
forplanningTom> purposes anyway --- that would foreclose passing down anyTom> knowledge of desired sort order, which we
don'twant.  But itTom> seems like we could stick a variant of CtePath atop the chosenTom> result path of the scan/join
planningphase.  If you like I canTom> poke into this a bit.
 

Please do.

That seems to cover the high-priority issues from our point of view.

We will continue working on the other issues, on the assumption that
when we have some idea how to do it your way, we will rip out the
ChainAggregate stuff in favour of an Append-based solution.

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Commitfest problems
Next
From: Andrew Gierth
Date:
Subject: 9.4rc bug in percentile_cont