Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id CAH2-WznP_v5dO-vC=GKXoDSDF6KyVR_La4dJVdr=1KxR_TbpMg@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Fri, Jul 24, 2020 at 8:19 AM Robert Haas <robertmhaas@gmail.com> wrote:
> This is all really good analysis, I think, but this seems like the key
> finding. It seems like we don't really understand what's actually
> getting written. Whether we use hash or sort doesn't seem like it
> should have this kind of impact on how much data gets written, and
> whether we use CP_SMALL_TLIST or project when needed doesn't seem like
> it should matter like this either.

Isn't this more or less the expected behavior in the event of
partitions that are spilled recursively? The case that Tomas tested
were mostly cases where work_mem was tiny relative to the data being
aggregated.

The following is an extract from commit 1f39bce0215 showing some stuff
added to the beginning of nodeAgg.c:

+ * We also specify a min and max number of partitions per spill. Too few might
+ * mean a lot of wasted I/O from repeated spilling of the same tuples. Too
+ * many will result in lots of memory wasted buffering the spill files (which
+ * could instead be spent on a larger hash table).
+ */
+#define HASHAGG_PARTITION_FACTOR 1.50
+#define HASHAGG_MIN_PARTITIONS 4
+#define HASHAGG_MAX_PARTITIONS 1024

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Missing CFI in hlCover()?
Next
From: Tom Lane
Date:
Subject: Re: Missing CFI in hlCover()?