Re: Combining Aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Combining Aggregates
Date
Msg-id CAApHDvrSYKBUkBZMG6e8SfaPFsO7bXK+tSWkbo_t_uO26MRtkQ@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Combining Aggregates  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 18 December 2014 at 02:48, Simon Riggs <simon@2ndquadrant.com> wrote:
On 17 December 2014 at 12:35, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Its concept is good to me. I think, the new combined function should be
> responsible to take a state data type as argument and update state object
> of the aggregate function. In other words, combined function performs like
> transition function but can update state object according to the summary
> of multiple rows. Right?

That wasn't how I defined it, but I think your definition is better.

> It also needs some enhancement around Aggref/AggrefExprState structure to
> inform which function shall be called on execution time.
> Combined functions are usually no-thank-you. AggrefExprState updates its
> internal state using transition function row-by-row. However, once someone
> push-down aggregate function across table joins, combined functions have
> to be called instead of transition functions.
> I'd like to suggest Aggref has a new flag to introduce this aggregate expects
> state object instead of scalar value.
>
> Also, I'd like to suggest one other flag in Aggref not to generate final
> result, and returns state object instead.
>
> Let me use David's example but little bit adjusted.
>
> original)
> SELECT p.name, AVG(s.qty)
>   FROM sales s INNER JOIN product p ON s.product_id = s.product_id
>   GROUP BY p.name;
>
> modified)
> SELECT p.name, AVG(qty)
>   FROM (SELECT product_id, AVG(qty) AS qty FROM sales GROUP BY product_id) s
>        INNER JOIN product p
>   ON p.product_id = s.product_id GROUP BY p_name;
>
> Let's assume the modifier set a flag of use_combine_func on the AVG(qty) of
> the main query, and also set a flag of not_generate_final on the AVG(qty) of
> the sub-query.
> It shall work as we expected.

That matches my thinking exactly.

David, if you can update your patch with some docs to explain the
behaviour, it looks complete enough to think about committing it in
early January, to allow other patches that depend upon it to stand a
chance of getting into 9.5. (It is not yet ready, but I see it could
be).


Yes I'll add something to it and post here. 
 
The above example is probably the best description of the need, since
user defined aggregates must also understand this.

Could we please call these "combine functions" or other? MERGE is an
SQL Standard statement type that we will add later, so it will be
confusing if we use the "merge" word in this context.


Yeah I think you're right, combine may help remove some confusion when we get MERGE.
 
David, your patch avoids creating any mergefuncs for existing
aggregates. We would need to supply working examples for at least a
few of the builtin aggregates, so we can test the infrastructure. We
can add examples for all cases later.

 
I added merge/combine functions for all the aggregates I could do by making use of existing functions. I did all the MAX() and MIN() ones and bit_and(), bit_or(), and a few sum() ones that didn't have a final function.

It felt a bit premature to add new functions to support avg and stddev, since that's probably the bulk of the work.
 
Is there a way of testing this in existing code? Or do we need to add
something to test it?


I can't think of anyway to test it. Apart from the create aggregate syntax test I also added.

Standalone calls to the combine/merge functions I don't think would be testing anything new.

That's the reason I thought it wasn't really acceptable until we have a use for this. That's why I posted on the thread about parallel seqscan. I hoped that Amit could add something which needed it and I could get it committed that way.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [alvherre@2ndquadrant.com: Re: no test programs in contrib]
Next
From: Heikki Linnakangas
Date:
Subject: Re: btree_gin and ranges