Re: Combining Aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Combining Aggregates
Date
Msg-id CAKJS1f8WOG4UKztMjuK-JkaFDaWRWCde4L=QGM7rS-r19egthw@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On 23 December 2015 at 21:50, David Rowley <david.rowley@2ndquadrant.com> wrote:
Apart from the problems I listed above, I'm reasonably happy with the patch now, and I'm ready for someone else to take a serious look at it.

I forgot to mention another situation where this patch might need a bit more thought.  This does not affect any of the built in aggregate functions, but if someone created a user defined aggregate such as:

CREATE AGGREGATE sumplusten (int)
(
stype = int,
sfunc = int4pl,
combinefunc = int4pl,
initcond = '10'
);

Note the initcond which initialises the state to 10. Then let's say we later add the ability to push aggregates down below a Append.

create table t1 as select 10 as value from generate_series(1,10);
create table t2 as select 10 as value from generate_series(1,10);

With the following we get the correct result:

# select sumplusten(value) from (select * from t1 union all select * from t2) t;
 sumplusten
------------
        210
(1 row)

But if we simulate how it would work in the aggregate push down situation:

# select sum(value) from (select sumplusten(value) as value from t1 union all select sumplusten(value) as value from t2) t;
 sum
-----
 220
(1 row)

Here I'm using sum() as a mock up of the combine function, but you get the idea... Since we initialize the state twice, we get the wrong result.

Now I'm not too sure if anyone would have an aggregate defined like this in the real world, but I don't think that'll give us a good enough excuse to go returning wrong results.

In the patch I could fix this by changing partial_aggregate_walker() to disable partial aggregation if the aggregate has an initvalue, but that would exclude a number of built-in aggregates unnecessarily. Alternatively if there was some way to analyze the initvalue to see if it was "zero"... I'm just not quite sure how we might do that at the moment.

Any thoughts on a good way to fix this that does not exclude built-in aggregates with an initvalue are welcome.
 
--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Next
From: David Rowley
Date:
Subject: Re: Combining Aggregates