Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Spilling hashed SetOps and aggregates to disk
Date
Msg-id 5a5ea550-847d-4976-fd22-dc47a8808757@2ndquadrant.com
Whole thread Raw
In response to Re: Spilling hashed SetOps and aggregates to disk  (Andres Freund <andres@anarazel.de>)
Responses Re: Spilling hashed SetOps and aggregates to disk
List pgsql-hackers
On 06/06/2018 04:11 PM, Andres Freund wrote:
> On 2018-06-06 16:06:18 +0200, Tomas Vondra wrote:
>> On 06/06/2018 04:01 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-06-06 15:58:16 +0200, Tomas Vondra wrote:
>>>> The other issue is that serialize/deserialize is only a part of a problem -
>>>> you also need to know how to do "combine", and not all aggregates can do
>>>> that ... (certainly not in universal way).
>>>
>>> There are several schemes where only serialize/deserialize are needed,
>>> no?  There are a number of fairly sensible schemes where there won't be
>>> multiple transition values for the same group, no?
>>>
>>
>> Possibly, not sure what schemes you have in mind exactly ...
>>
>> But if you know there's only a single transition value, why would you need
>> serialize/deserialize at all. Why couldn't you just finalize the value and
>> serialize that?
> 
> Because you don't necessarily have all the necessary input rows
> yet.
> 
> Consider e.g. a scheme where we'd switch from hashed aggregation to
> sorted aggregation due to memory limits, but already have a number of
> transition values in the hash table. Whenever the size of the transition
> values in the hashtable exceeds memory size, we write one of them to the
> tuplesort (with serialized transition value). From then on further input
> rows for that group would only be written to the tuplesort, as the group
> isn't present in the hashtable anymore.
> 

Ah, so you're suggesting that during the second pass we'd deserialize
the transition value and then add the tuples to it, instead of building
a new transition value. Got it.

That being said, I'm not sure if such generic serialize/deserialize can
work, but I'd guess no, otherwise we'd probably use it when implementing
the parallel aggregate.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: libpq compression
Next
From: Steven Fackler
Date:
Subject: Re: Supporting tls-server-end-point as SCRAM channel binding forOpenSSL 1.0.0 and 1.0.1