Re: Combining Aggregates - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Combining Aggregates
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8010B38D9@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Combining Aggregates  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Combining Aggregates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Combining Aggregates  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Hi Rowley,

Let me put some comments on this patch.

This patch itself looks good as an infrastructure towards
the big picture, however, we still don't reach the consensus
how combined functions are used instead of usual translation
functions.

Aggregate function usually consumes one or more values extracted
from a tuple, then it accumulates its internal state according
to the argument. Exiting transition function performs to update
its internal state with assumption of a function call per records.
On the other hand, new combined function allows to update its
internal state with partial aggregated values which is processed
by preprocessor node.
An aggregate function is represented with Aggref node in plan tree,
however, we have no certain way to determine which function shall
be called to update internal state of aggregate.

For example, avg(float) has an internal state with float[3] type
for number of rows, sum of X and X^2. If combined function can
update its internal state with partially aggregated values, its
argument should be float[3]. It is obviously incompatible to
float8_accum(float) that is transition function of avg(float).
I think, we need a new flag on Aggref node to inform executor
which function shall be called to update internal state of
aggregate. Executor cannot decide it without this hint.

Also, do you have idea to push down aggregate function across
joins? Even though it is a bit old research, I could find
a systematic approach to push down aggregate across join.
https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf

I think, it is great if core functionality support this query
rewriting feature based on cost estimation, without external
modules.

How about your opinions?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of David Rowley
> Sent: Friday, December 19, 2014 8:39 PM
> To: Simon Riggs
> Cc: Kaigai Kouhei(海外 浩平); PostgreSQL-development; Amit Kapila
> Subject: Re: [HACKERS] Combining Aggregates
> 
> On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:
> 
> 
>     David, if you can update your patch with some docs to explain the
>     behaviour, it looks complete enough to think about committing it
> in
>     early January, to allow other patches that depend upon it to stand
> a
>     chance of getting into 9.5. (It is not yet ready, but I see it could
>     be).
> 
> 
> 
> 
> Sure, I've more or less added the docs from your patch. I still need to
> trawl through and see if there's anywhere else that needs some additions.
> 
> 
>     The above example is probably the best description of the need,
> since
>     user defined aggregates must also understand this.
> 
>     Could we please call these "combine functions" or other? MERGE is
> an
>     SQL Standard statement type that we will add later, so it will be
>     confusing if we use the "merge" word in this context.
> 
> 
> 
> 
> Ok, changed.
> 
> 
>     David, your patch avoids creating any mergefuncs for existing
>     aggregates. We would need to supply working examples for at least
> a
>     few of the builtin aggregates, so we can test the infrastructure.
> We
>     can add examples for all cases later.
> 
> 
> 
> 
> In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating point
> types, cash and interval. I've now added combine functions for count(*)
> and count(col). It seems that int8pl() is suitable for this.
> 
> 
> Do you think it's worth adding any new functions at this stage, or is it
> best to wait until there's a patch which can use these?
> 
> Regards
> 
> David Rowley

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Perl coding error in msvc build system?
Next
From: Michael Paquier
Date:
Subject: Re: Expanding the use of FLEXIBLE_ARRAY_MEMBER for declarations like foo[1]