Re: Spilling hashed SetOps and aggregates to disk - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Spilling hashed SetOps and aggregates to disk
Date
Msg-id 20180605175209.vavuqe4idovcpeie@alap3.anarazel.de
Whole thread Raw
In response to Re: Spilling hashed SetOps and aggregates to disk  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi,

On 2018-06-05 10:47:49 -0700, Jeff Davis wrote:
> The thing I don't like about it is that it requires running two memory-
> hungry operations at once. How much of work_mem do we use for sorted
> runs, and how much do we use for the hash table?

Is that necessarily true? I'd assume that we'd use a small amount of
memory for the tuplesort, enough to avoid unnecessary disk spills for
each tuple. But a few kb should be enough - think it's fine to
aggressively spill to disk, we after all already have handled the case
of smaller number of input rows.  Then at the end of the run, we empty
out the hashtable, and free it. Only then we do to the sort.

One thing this wouldn't handle are datatypes that support hashing, but
no sorting. Not exactly common.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: [PATCH] Trim trailing whitespace in vim and emacs
Next
From: "MauMau"
Date:
Subject: Re: I'd like to discuss scaleout at PGCon