Re: Combining Aggregates - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: Combining Aggregates
Date
Msg-id CAOeZVifw7pkkXvpzrEZ9waqFPLOf_xz+bNRzLBHnZ1siSz=+mg@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers


On Wed, Dec 17, 2014 at 7:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Its concept is good to me. I think, the new combined function should be
> responsible to take a state data type as argument and update state object
> of the aggregate function. In other words, combined function performs like
> transition function but can update state object according to the summary
> of multiple rows. Right?

That wasn't how I defined it, but I think your definition is better.

> It also needs some enhancement around Aggref/AggrefExprState structure to
> inform which function shall be called on execution time.
> Combined functions are usually no-thank-you. AggrefExprState updates its
> internal state using transition function row-by-row. However, once someone
> push-down aggregate function across table joins, combined functions have
> to be called instead of transition functions.
> I'd like to suggest Aggref has a new flag to introduce this aggregate expects
> state object instead of scalar value.
>
> Also, I'd like to suggest one other flag in Aggref not to generate final
> result, and returns state object instead.
>
> Let me use David's example but little bit adjusted.
>
> original)
> SELECT p.name, AVG(s.qty)
>   FROM sales s INNER JOIN product p ON s.product_id = s.product_id
>   GROUP BY p.name;
>
> modified)
> SELECT p.name, AVG(qty)
>   FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
>        INNER JOIN product p
>   ON p.product_id = s.product_id GROUP BY p_name;
>
> Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
> the main query, and also set a flag of not_generate_final on the AVG(qty) of
> the sub-query.
> It shall work as we expected.

That matches my thinking exactly.

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).

The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.

David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.



I am still missing how and why we skip transfns. What am I missing,please? 

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Next
From: Simon Riggs
Date:
Subject: Re: parallel mode and parallel contexts