Home > mailing lists

Re: Combining Aggregates - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Combining Aggregates
Date	February 20, 2015 18:16:38
Msg-id	54E779EE.7090701@2ndquadrant.com Whole thread Raw
In response to	Re: Combining Aggregates (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses	Re: Combining Aggregates
List	pgsql-hackers

Tree view

Hi,

On 18.2.2015 09:13, Kouhei Kaigai wrote:
>>
>> In addition to MIN(), MAX(), BIT_AND(), BIT_OR, SUM() for floating
>> point types, cash and interval. I've now added combine functions
>> for count(*) and count(col). It seems that int8pl() is suitable for
>> this.
>>
>>
>> Do you think it's worth adding any new functions at this stage, or 
>> is it best to wait until there's a patch which can use these?

A few comments about this patch:

1) another use case / grouping sets
-----------------------------------

I agree this would be nice to have in-core.

I remember discussing this functionality (combining partial aggregate
results) as an alternative implementation to grouping sets. IIRC the
grouping sets patch implements this by repeatedly sorting the tuples,
but in some cases we could compute the aggregates at the most detailed
level, and then build the results by combining the partial results. Just
an idea, at this moment, though.

2) serialize/deserialize functions
----------------------------------

This thread mentions "parallel queries" as a use case, but that means
passing data between processes, and that requires being able to
serialize and deserialize the aggregate state somehow. For actual data
types that's not overly difficult I guess (we can use in/out functions),
but what about aggretates using 'internal' state? I.e. aggregates
passing pointers that we can't serialize?

And we do have plenty of those, including things like
   array_agg   avg   cume_dist   dense_rank   json_agg   jsonb_agg   jsonb_object_agg   json_object_agg   mode
percentile_cont  percentile_disc   percent_rank   rank   stddev   stddev_pop   stddev_samp   string_agg   sum
variance  var_pop   var_samp

Those are pretty important aggregates IMHO, and ISTM we won't be able to
use them with this patch. Or am I missing something?

So maybe this patch should include support for serialize/deserialize
functions too? Or maybe a follow-up patch ... I'm not entirely
comfortable with a patch without an actual use case, except for a simple
example. But maybe that's OK.

FWIW the serialize/deserialize functions would be useful also for
implementing a truly 'memory-bounded hash aggregate' (discussed
elsewhere, currently stuck because of difficulty with implementing
memory accounting). So that's yet another use case for this (both the
'combine' function and the 'serialize/deserialize').

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Kevin Grittner
Date: 20 February 2015, 17:18:36
Subject: Re: Precedence of standard comparison operators

From: Tom Lane
Date: 20 February 2015, 18:51:31
Subject: Idea: closing the loop for "pg_ctl reload"

Re: Combining Aggregates - Mailing list pgsql-hackers

Previous

Next