Re: Combining Aggregates - Mailing list pgsql-hackers

From David Rowley
Subject Re: Combining Aggregates
Date
Msg-id CAKJS1f9rmPrsXdnF14nxg6N7PcO+pZmtQGH3GmXuy0q-Vz4kXQ@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Combining Aggregates  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 22 December 2015 at 01:30, Robert Haas <robertmhaas@gmail.com> wrote:
Can we use Tom's expanded-object stuff instead of introducing
aggserialfn and aggdeserialfn?  In other words, if you have a
aggtranstype = INTERNAL, then what we do is:

1. Create a new data type that represents the transition state.
2. Use expanded-object notation for that data type when we're just
within a single process, and flatten it when we need to send it
between processes.


I'd not seen this before, but on looking at it I'm not sure if using it will be practical to use for this. I may have missed something, but it seems that after each call of the transition function, I'd need to ensure that the INTERNAL state was in the varlana format. This might be ok for a state like Int8TransTypeData, since that struct has no pointers, but I don't see how that could be done efficiently for NumericAggState, which has two NumericVar, which will have pointers to other memory. The trans function also has no idea whether it'll be called again for this state, so it does not seem possible to delay the conversion until the final call of the trans function.
 
One thing to keep in mind is that we also want to be able to support a
plan that involves having one or more remote servers do partial
aggregation, send us the partial values, combine them across servers
and possibly also with locally computed-values, and the finalize the
aggregation.  So it would be nice if there were a way to invoke the
aggregate function from SQL and get a transition value back rather
than a final value.

This will be possible with what I proposed. The Agg Node will just need to be setup with finalizeAggs=false, serialState=true. That way the returned aggregate values will be the states converted into the serial type, to which we can call the output function on and send where ever we like. 


--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Aggregate
Next
From: Craig Ringer
Date:
Subject: Re: Experimental evaluation of PostgreSQL's query optimizer