Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Spilling hashed SetOps and aggregates to disk
Date
Msg-id 20180605124958.xe4by2yinnove6e7@alap3.anarazel.de
Whole thread Raw
In response to Re: Spilling hashed SetOps and aggregates to disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Spilling hashed SetOps and aggregates to disk
List pgsql-hackers
Hi,

On 2018-06-05 10:05:35 +0200, Tomas Vondra wrote:
> My concern is more about what happens when the input tuple ordering is
> inherently incompatible with the eviction strategy, greatly increasing the
> amount of data written to disk during evictions.
> 
> Say for example that we can fit 1000 groups into work_mem, and that there
> are 2000 groups in the input data set. If the input is correlated with the
> groups, everything is peachy because we'll evict the first batch, and then
> group the remaining 1000 groups (or something like that). But if the input
> data is random (which can easily happen, e.g. for IP addresses, UUIDs and
> such) we'll hit the limit repeatedly and will evict much sooner.

> I know you suggested simply dumping the whole hash table and starting from
> scratch while we talked about this at pgcon, but ISTM it has exactly this
> issue.

Yea, that's the case I was thinking of where going to sorting will very
likely have better performance.

I think it'd even be sensible to have a "skew tuple" like
optimization. When detecting getting closer to memory exhaustion, move
new groups to the tuplesort, but already hashed tuples stay in the
hashtable.  That'd still need tuples being moved to the sort in the
cases where the transition values get to big (say array_agg), but that
should be comparatively rare.  I'm sure we could do better in selecting
the hash-tabled values than just taking the first incoming ones, but
that shouldn't be too bad.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Spilling hashed SetOps and aggregates to disk
Next
From: David Rowley
Date:
Subject: Re: Spilling hashed SetOps and aggregates to disk