On 06/06/2018 04:11 PM, Andres Freund wrote:
> On 2018-06-06 16:06:18 +0200, Tomas Vondra wrote:
>> On 06/06/2018 04:01 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-06-06 15:58:16 +0200, Tomas Vondra wrote:
>>>> The other issue is that serialize/deserialize is only a part of a problem -
>>>> you also need to know how to do "combine", and not all aggregates can do
>>>> that ... (certainly not in universal way).
>>>
>>> There are several schemes where only serialize/deserialize are needed,
>>> no? There are a number of fairly sensible schemes where there won't be
>>> multiple transition values for the same group, no?
>>>
>>
>> Possibly, not sure what schemes you have in mind exactly ...
>>
>> But if you know there's only a single transition value, why would you need
>> serialize/deserialize at all. Why couldn't you just finalize the value and
>> serialize that?
>
> Because you don't necessarily have all the necessary input rows
> yet.
>
> Consider e.g. a scheme where we'd switch from hashed aggregation to
> sorted aggregation due to memory limits, but already have a number of
> transition values in the hash table. Whenever the size of the transition
> values in the hashtable exceeds memory size, we write one of them to the
> tuplesort (with serialized transition value). From then on further input
> rows for that group would only be written to the tuplesort, as the group
> isn't present in the hashtable anymore.
>
Ah, so you're suggesting that during the second pass we'd deserialize
the transition value and then add the tuples to it, instead of building
a new transition value. Got it.
That being said, I'm not sure if such generic serialize/deserialize can
work, but I'd guess no, otherwise we'd probably use it when implementing
the parallel aggregate.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services