Re: upper planner path-ification - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: upper planner path-ification
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8011093D4@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: upper planner path-ification  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: upper planner path-ification  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> -----Original Message-----
> From: David Rowley [mailto:david.rowley@2ndquadrant.com]
> Sent: Tuesday, June 23, 2015 2:06 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; pgsql-hackers@postgresql.org; Tom Lane
> Subject: Re: [HACKERS] upper planner path-ification
> 
> 
> On 23 June 2015 at 13:55, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> 
> 
>     Once we support to add aggregation path during path consideration,
>     we need to pay attention morphing of the final target-list according
>     to the intermediate path combination, tentatively chosen.
>     For example, if partial-aggregation makes sense from cost perspective;
>     like SUM(NRows) of partial COUNT(*) AS NRows instead of COUNT(*) on
>     billion rows, planner also has to adjust the final target-list according
>     to the interim paths. In this case, final output shall be SUM(), instead
>     of COUNT().
> 
> 
> 
> 
> This sounds very much like what's been discussed here:
> 
> http://www.postgresql.org/message-id/CA+U5nMJ92azm0Yt8TT=hNxFP=VjFhDqFpaWfmj
> +66-4zvCGv3w@mail.gmail.com
> 
> 
> The basic concept is that we add another function set to aggregates that allow
> the combination of 2 states. For the case of MIN() and MAX() this will just be
> the same as the transfn. SUM() is similar for many types, more complex for others.
> I've quite likely just borrowed SUM(BIGINT)'s transfer functions to allow
> COUNT()'s to be combined.
>
STDDEV, VARIANCE and relevant can be constructed using nrows, sum(X) and sum(X^2).
REGR_*, COVAR_* and relevant can be constructed using nrows, sum(X), sum(Y),
sum(X^2), sum(Y^2) and sum(X*Y).

Let me introduce a positive side effect of this approach.
Because final aggregate function processes values already aggregated partially,
the difference between the state value and transition value gets relatively small.
It reduces accidental errors around floating-point calculation. :-)

> More time does need spent inventing the new combining functions that don't
> currently exist, but that shouldn't matter as it can be done later.
> 
> Commitfest link to patch here https://commitfest.postgresql.org/5/131/
> 
> I see you've signed up to review it!
>
Yes, all of us looks at same direction.

Problem is, we have to cross the mountain of the planner enhancement to reach
all the valuable:- parallel aggregation- aggregation before join- remote aggregation via FDW

So, unless we don't find out a solution around planner, 2-phase aggregation is
like a curry without rice....

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Kouhei Kaigai
Date:
Subject: Re: upper planner path-ification
Next
From: Rui Hai Jiang
Date:
Subject: how is a query passed between a coordinator and a datanode