Home > mailing lists

Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Spilling hashed SetOps and aggregates to disk
Date	June 6, 2018 23:11:36
Msg-id	5a5ea550-847d-4976-fd22-dc47a8808757@2ndquadrant.com Whole thread Raw
In response to	Re: Spilling hashed SetOps and aggregates to disk (Andres Freund <andres@anarazel.de>)
Responses	Re: Spilling hashed SetOps and aggregates to disk
List	pgsql-hackers

Tree view

On 06/06/2018 04:11 PM, Andres Freund wrote:
> On 2018-06-06 16:06:18 +0200, Tomas Vondra wrote:
>> On 06/06/2018 04:01 PM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-06-06 15:58:16 +0200, Tomas Vondra wrote:
>>>> The other issue is that serialize/deserialize is only a part of a problem -
>>>> you also need to know how to do "combine", and not all aggregates can do
>>>> that ... (certainly not in universal way).
>>>
>>> There are several schemes where only serialize/deserialize are needed,
>>> no?  There are a number of fairly sensible schemes where there won't be
>>> multiple transition values for the same group, no?
>>>
>>
>> Possibly, not sure what schemes you have in mind exactly ...
>>
>> But if you know there's only a single transition value, why would you need
>> serialize/deserialize at all. Why couldn't you just finalize the value and
>> serialize that?
> 
> Because you don't necessarily have all the necessary input rows
> yet.
> 
> Consider e.g. a scheme where we'd switch from hashed aggregation to
> sorted aggregation due to memory limits, but already have a number of
> transition values in the hash table. Whenever the size of the transition
> values in the hashtable exceeds memory size, we write one of them to the
> tuplesort (with serialized transition value). From then on further input
> rows for that group would only be written to the tuplesort, as the group
> isn't present in the hashtable anymore.
> 

Ah, so you're suggesting that during the second pass we'd deserialize
the transition value and then add the tuples to it, instead of building
a new transition value. Got it.

That being said, I'm not sure if such generic serialize/deserialize can
work, but I'd guess no, otherwise we'd probably use it when implementing
the parallel aggregate.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Peter Eisentraut
Date: 06 June 2018, 23:01:32
Subject: Re: libpq compression

From: Steven Fackler
Date: 06 June 2018, 23:16:11
Subject: Re: Supporting tls-server-end-point as SCRAM channel binding forOpenSSL 1.0.0 and 1.0.1

Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

Previous

Next