Re: Parallel Aggregate - Mailing list pgsql-hackers

From Haribabu Kommi
Subject Re: Parallel Aggregate
Date
Msg-id CAJrrPGdhpDYm8dXzegEnGYTQ4dtW+MVTU-20F4vW0QfiE=phKw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregate  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Mon, Oct 12, 2015 at 2:25 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 October 2015 at 15:07, Haribabu Kommi <kommi.haribabu@gmail.com>
> wrote:
>>
>> - check whether the aggregate supports parallelism or not.
>>
>> As for first patch, I thought of supporting only some aggregates for
>> this parallel aggregate.
>> The supported aggregates are mainly the aggregate functions that have
>> variable length data types as final and transition types. This is to
>> avoid changing the target list return types. Because of variable
>> lengths, even the transition type can be returned to backend without
>> applying the final function in aggregate. To identify the supported
>> aggregates for parallelism, a new member is added to pg_aggregate
>> system catalog table.
>>
>> - currently Group and plain aggregates are only supported for simplicity.
>>
>> This patch doesn't change anything in aggregate plan decision. If the
>> planner decides the group
>> or plain aggregates as the best plan, then we will check whether this
>> can be converted into
>> parallel aggregate or not?
>
>
> Hi,
>
> I've never previously proposed any implementation for parallel aggregation,
> but I have previously proposed infrastructure to allow aggregation to happen
> in multiple steps. It seems your plan sounds very different from what I've
> proposed.
>
> I attempted to convey my idea on this to the community here
> http://www.postgresql.org/message-id/CAKJS1f-TmWi-4c5K6CBLRdTfGsVxOJhadefzjE7SWuVBgMSkXA@mail.gmail.com
> which Simon and I proposed an actual proof of concept patch here
> https://commitfest.postgresql.org/5/131/

My plan also to use the combine_aggregate_state_v2.patch or similar
that you have proposed to merge the partial aggregate results
and combine them in the backend process. As a POC patch, I just want
to limit this functionality to aggregates that have variable length
datatypes as transition and final arguments.

> I've since expanded on that work in the form of a WIP patch which implements
> GROUP BY before JOIN here
> http://www.postgresql.org/message-id/CAKJS1f9kw95K2pnCKAoPmNw==7fgjSjC-82cy1RB+-x-Jz0QHA@mail.gmail.com
>
> It's pretty evident that we both need to align the way we plan to handle
> this multiple step aggregation, there's no sense at all in having 2
> different ways of doing this. Perhaps you could look over my patch and let
> me know the parts which you disagree with, then we can resolve these
> together and come up with the best solution for each of us.

Thanks for the details. I will go through it. From a basic view, this
patch is an
enhancement of combine_aggregate_state_v2.patch.

> It may also be useful for you to glance at how Postgres-XL handles this
> partial aggregation problem, as it, where possible, will partially aggregate
> the results on each node, pass the partially aggregates state to the master
> node to have it perform the final aggregate stage on each of the individual
> aggregate states from each node. Note that this requires giving the
> aggregates with internal aggregate states an SQL level type and it also
> means implementing an input and output function for these types. I've
> noticed that XL mostly handles this by making the output function build a
> string something along the lines of <count>:<sum> for aggregates such as
> AVG(). I believe you'll need something very similar to this to pass the
> partial states between worker and master process.

Yes, we may need something like this, or adding the support of passing internal
datatypes between worker and backend process to support all aggregate functions.

Regards,
Hari Babu
Fujitsu Australia



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Aggregate
Next
From: Amit Kapila
Date:
Subject: Re: Postgres service stops when I kill client backend on Windows