Re: Combining Aggregates - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Combining Aggregates
Date
Msg-id CA+TgmoaDELi70g1o7w8PLoL-hKD=i6UK22RgNOGewvr6Sn7S-g@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
[ rewinding to here from the detour I led us on ]

On Mon, Dec 21, 2015 at 4:02 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Now, there has been talk of this previously, on various threads, but I don't
> believe any final decisions were made on how exactly it should be done. At
> the moment I plan to make changes as follows:
>
>  Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and
> aggserialtype These will only be required when aggtranstype is INTERNAL.

Check.

> Perhaps we should disallow CREATE AGGREAGET from accepting them for any
> other type...

Well, we should definitely not accept them and have them be
meaningless.  We should either reject them or accept them and then use
them.  I can't immediately think of a reason to serialize one
non-internal type as another, but maybe there is one.

> The return type of aggserialfn should be aggserialtype, and it
> should take a single parameter of aggtranstype. aggdeserialfn will be the
> reverse of that.

Check.

> Add a new bool field to nodeAgg's state named serialStates. If this is field
> is set to true then when we're in finalizeAgg = false mode, we'll call the
> serialfn on the agg state instead of the finalfn. This will allow the
> serialized state to be stored in the tuple and sent off to the main backend.
> The combine agg node should also be set to serialStates = true, so that it
> knows to deserialize instead of just assuming that the agg state is of type
> aggtranstype.

I'm not quite sure, but it sounds like you might be overloading
serialStates with two different meanings here.

> I believe this should allow us to not cause any performance regressions by
> moving away from INTERNAL agg states. It should also be very efficient for
> internal states such as Int8TransTypeData, as this struct merely has 2 int64
> fields which should be very simple to stuff into a bytea varlena type. We
> don't need to mess around converting the ->count and ->sum into a strings or
> anything.

I think it would be more user-friendly to emit these as 2-element
integer arrays rather than bytea.  Sure, bytea is fine when PostgreSQL
is talking to itself, but if you are communicating with an external
system, it's a different situation.  If the remote system is
PostgreSQL then you are again OK, except for the data going over the
wire being incomprehensible and maybe byte-order-dependent, but what
if you want some other database system to do partial aggregation and
then further aggregate the result?  You don't want the intermediate
state to be some kooky thing that only another PostgreSQL database has
a chance of generating correctly.

> Then in order for the planner to allow parallel aggregation all aggregates
> must:
>
> Not have a DISTINCT or ORDER BY clause
> Have a combinefn
> If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn.
>
> We can relax the requirement on 3 if we're using 2-stage aggregation, but
> not parallel aggregation.

When would we do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Combining Aggregates
Next
From: Robert Haas
Date:
Subject: Re: Combining Aggregates