Re: Parallel Aggregate - Mailing list pgsql-hackers

From Haribabu Kommi
Subject Re: Parallel Aggregate
Date
Msg-id CAJrrPGePkWk+x5D0bf=cRWZFy-n0fSF-iiiGiRV_UtJBkC86hw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregate  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Oct 11, 2015 at 10:07 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
>> Parallel aggregate is the feature doing the aggregation job parallel
>> with the help of Gather and
>> partial seq scan nodes. The following is the basic overview of the
>> parallel aggregate changes.
>>
>> Decision phase:
>>
>> Based on the following conditions, the parallel aggregate plan is generated.
>>
>> - check whether the below plan node is Gather + partial seq scan only.
>>
>> This is because to check whether the plan nodes that are present are
>> aware of parallelism or not?
>
> This is really not the right way of doing this.  We should do
> something more general.  Most likely, parallel aggregate should wait
> for Tom's work refactoring the upper planner to use paths.  But either
> way, it's not a good idea to limit ourselves to parallel aggregation
> only in the case where there is exactly one base table.

Ok. Thanks for the details.

> One of the things I want to do pretty early on, perhaps in time for
> 9.6, is create a general notion of partial paths.  A Partial Seq Scan
> node creates a partial path.  A Gather node turns a partial path into
> a complete path.  A join between a partial path and a complete path
> creates a new partial path.  This concept lets us consider,
> essentially, pushing joins below Gather nodes.  That's quite powerful
> and could make Partial Seq Scan applicable to a much broader variety
> of use cases.  If there are worthwhile partial paths for the final
> joinrel, and aggregation of that joinrel is needed, we can consider
> parallel aggregation using that partial path as an alternative to
> sticking a Gather node on there and then aggregating.
>
>> - Set the single_copy mode as true, in case if the below node of
>> Gather is a parallel aggregate.
>
> That sounds wrong.  Single-copy mode is for when we need to be certain
> of running exactly one copy of the plan.  If you're trying to have
> several workers aggregate in parallel, that's exactly what you don't
> want.

I mean of setting the flag is to avoid backend executing the child plan.

> Also, I think the path for parallel aggregation should probably be
> something like FinalizeAgg -> Gather -> PartialAgg -> some partial
> path here.  I'm not clear whether that is what you are thinking or
> not.

No. I am thinking of the following way.
Gather->partialagg->some partial path

I want the Gather node to merge the results coming from all workers, otherwise
it may be difficult to merge at parent of gather node. Because in case
the partial
group aggregate is under the Gather node, if any of two workers are returning
same group key data, we need to compare them and combine it to make it a
single group. If we are at Gather node, it is possible that we can
wait till we get
slots from all workers. Once all workers returns the slots we can compare
and merge the necessary slots and return the result. Am I missing something?

Regards,
Hari Babu
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [COMMITTERS] pgsql: Cause TestLib.pm to define $windows_os in all branches.
Next
From: dinesh kumar
Date:
Subject: [PROPOSAL] DIAGNOSTICS = SKIPPED_ROW_COUNT