Re: Parallel Aggregate - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Aggregate
Date
Msg-id CAKJS1f9BqhMRQO0AUbVmmduoOunH0_azqT77G2BzX5azG=QPNA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregate  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Responses Re: Parallel Aggregate  (Haribabu Kommi <kommi.haribabu@gmail.com>)
List pgsql-hackers
On 13 October 2015 at 17:09, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Also, I think the path for parallel aggregation should probably be
> something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> path here.  I'm not clear whether that is what you are thinking or
> not.

No. I am thinking of the following way.
Gather->partialagg->some partial path

I want the Gather node to merge the results coming from all workers, otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing something?

My assumption is the same as Robert's here.
Unless I've misunderstood, it sounds like you're proposing to add logic into the Gather node to handle final aggregation? That sounds like a modularity violation of the whole node concept. 

The handling of the final aggregate stage is not all that different from the initial aggregate stage. The primary difference is just that your calling the combine function instead of the transition function, and the values being aggregated are aggregates states rather than the type of the values which were initially aggregated. The handling of GROUP BY is all the same, yet you only apply the HAVING clause during final aggregation. This is why I ended up implementing this in nodeAgg.c instead of inventing some new node type that's mostly a copy and paste of nodeAgg.c [1]

If you're performing a hash aggregate you need to wait until all the partially aggregated groups are received anyway. If you're doing a sort/agg then you'll need to sort again after the Gather node.

 
--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan
Next
From: Simon Riggs
Date:
Subject: Re: Parallel Aggregate