Re: Combining Aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Combining Aggregates
Date
Msg-id CAApHDvpgXhghtpmuKPhnBj9ZDeEPy-8C0StXgG-GuTPAMdYp6A@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On 18 February 2015 at 21:13, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
This patch itself looks good as an infrastructure towards
the big picture, however, we still don't reach the consensus
how combined functions are used instead of usual translation
functions.

Thank you for taking the time to look at the patch.

Aggregate function usually consumes one or more values extracted
from a tuple, then it accumulates its internal state according
to the argument. Exiting transition function performs to update
its internal state with assumption of a function call per records.
On the other hand, new combined function allows to update its
internal state with partial aggregated values which is processed
by preprocessor node.
An aggregate function is represented with Aggref node in plan tree,
however, we have no certain way to determine which function shall
be called to update internal state of aggregate.


This is true, there's nothing done in the planner to set any sort of state in the aggregation nodes to tell them weather to call the final function or not.  It's quite hard to know how far to go with this patch. It's really only intended to provide the necessary infrastructure for things like parallel query and various other possible usages of aggregate combine functions. I don't think it's really appropriate for this patch to go adding such a property to any nodes as there would still be nothing in the planner to actually set those properties...  The only thing I can think of to get around this is implement the most simple use for combine aggregate functions, the problem with that is, that the most simple case is not at all simple.
 
 
For example, avg(float) has an internal state with float[3] type
for number of rows, sum of X and X^2. If combined function can
update its internal state with partially aggregated values, its
argument should be float[3]. It is obviously incompatible to
float8_accum(float) that is transition function of avg(float).
I think, we need a new flag on Aggref node to inform executor
which function shall be called to update internal state of
aggregate. Executor cannot decide it without this hint.

Also, do you have idea to push down aggregate function across
joins? Even though it is a bit old research, I could find
a systematic approach to push down aggregate across join.
https://cs.uwaterloo.ca/research/tr/1993/46/file.pdf


I've not read the paper yet, but I do have a very incomplete WIP patch to do this. I've just not had much time to work on it.
 
I think, it is great if core functionality support this query
rewriting feature based on cost estimation, without external
modules.
 
Regards

David Rowley

pgsql-hackers by date:

Previous
From: Shigeru Hanada
Date:
Subject: Re: Join push-down support for foreign tables
Next
From: David Rowley
Date:
Subject: Re: Combining Aggregates