Re: Parallel Aggregate - Mailing list pgsql-hackers

From David Rowley
Subject Re: Parallel Aggregate
Date
Msg-id CAKJS1f-0X_0-k+ixS-yBr-TdwEnOGAQY2Udm368=MK-fsaWsLA@mail.gmail.com
Whole thread Raw
In response to Parallel Aggregate  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Responses Re: Parallel Aggregate  (Haribabu Kommi <kommi.haribabu@gmail.com>)
List pgsql-hackers
On 12 October 2015 at 15:07, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
Parallel aggregate is the feature doing the aggregation job parallel
with the help of Gather and
partial seq scan nodes. The following is the basic overview of the
parallel aggregate changes.

Decision phase:

Based on the following conditions, the parallel aggregate plan is generated.

- check whether the below plan node is Gather + partial seq scan only.

This is because to check whether the plan nodes that are present are
aware of parallelism or not?

- check Are there any projection or qual condition is present in the
Gather node?

If there exists any quals and projection info that is required to
performed in the
Gather node because of the function that can only be executed in
master backends,
the parallel aggregate plan is not chosen.

- check whether the aggregate supports parallelism or not.

As for first patch, I thought of supporting only some aggregates for
this parallel aggregate.
The supported aggregates are mainly the aggregate functions that have
variable length data types as final and transition types. This is to
avoid changing the target list return types. Because of variable
lengths, even the transition type can be returned to backend without
applying the final function in aggregate. To identify the supported
aggregates for parallelism, a new member is added to pg_aggregate
system catalog table.

- currently Group and plain aggregates are only supported for simplicity.

This patch doesn't change anything in aggregate plan decision. If the
planner decides the group
or plain aggregates as the best plan, then we will check whether this
can be converted into
parallel aggregate or not?

Hi,

I've never previously proposed any implementation for parallel aggregation, but I have previously proposed infrastructure to allow aggregation to happen in multiple steps. It seems your plan sounds very different from what I've proposed.

I attempted to convey my idea on this to the community here http://www.postgresql.org/message-id/CAKJS1f-TmWi-4c5K6CBLRdTfGsVxOJhadefzjE7SWuVBgMSkXA@mail.gmail.com which Simon and I proposed an actual proof of concept patch here https://commitfest.postgresql.org/5/131/

I've since expanded on that work in the form of a WIP patch which implements GROUP BY before JOIN here http://www.postgresql.org/message-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com

It's pretty evident that we both need to align the way we plan to handle this multiple step aggregation, there's no sense at all in having 2 different ways of doing this. Perhaps you could look over my patch and let me know the parts which you disagree with, then we can resolve these together and come up with the best solution for each of us.

It may also be useful for you to glance at how Postgres-XL handles this partial aggregation problem, as it, where possible, will partially aggregate the results on each node, pass the partially aggregates state to the master node to have it perform the final aggregate stage on each of the individual aggregate states from each node. Note that this requires giving the aggregates with internal aggregate states an SQL level type and it also means implementing an input and output function for these types. I've noticed that XL mostly handles this by making the output function build a string something along the lines of <count>:<sum> for aggregates such as AVG(). I believe you'll need something very similar to this to pass the partial states between worker and master process.

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Parallel Aggregate
Next
From: Haribabu Kommi
Date:
Subject: Re: Parallel Aggregate