Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Spilling hashed SetOps and aggregates to disk
Date
Msg-id 20180606141111.hszzgxmgjqcvskwh@alap3.anarazel.de
Whole thread Raw
In response to Re: Spilling hashed SetOps and aggregates to disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Spilling hashed SetOps and aggregates to disk
List pgsql-hackers
On 2018-06-06 16:06:18 +0200, Tomas Vondra wrote:
> On 06/06/2018 04:01 PM, Andres Freund wrote:
> > Hi,
> > 
> > On 2018-06-06 15:58:16 +0200, Tomas Vondra wrote:
> > > The other issue is that serialize/deserialize is only a part of a problem -
> > > you also need to know how to do "combine", and not all aggregates can do
> > > that ... (certainly not in universal way).
> > 
> > There are several schemes where only serialize/deserialize are needed,
> > no?  There are a number of fairly sensible schemes where there won't be
> > multiple transition values for the same group, no?
> > 
> 
> Possibly, not sure what schemes you have in mind exactly ...
> 
> But if you know there's only a single transition value, why would you need
> serialize/deserialize at all. Why couldn't you just finalize the value and
> serialize that?

Because you don't necessarily have all the necessary input rows
yet.

Consider e.g. a scheme where we'd switch from hashed aggregation to
sorted aggregation due to memory limits, but already have a number of
transition values in the hash table. Whenever the size of the transition
values in the hashtable exceeds memory size, we write one of them to the
tuplesort (with serialized transition value). From then on further input
rows for that group would only be written to the tuplesort, as the group
isn't present in the hashtable anymore.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Spilling hashed SetOps and aggregates to disk
Next
From: Tom Lane
Date:
Subject: Re: buildfarm vs code