Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Default setting for enable_hashagg_disk
Date	July 24, 2020 21:03:54
Msg-id	CAH2-WznP_v5dO-vC=GKXoDSDF6KyVR_La4dJVdr=1KxR_TbpMg@mail.gmail.com Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Default setting for enable_hashagg_disk (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Fri, Jul 24, 2020 at 8:19 AM Robert Haas <robertmhaas@gmail.com> wrote:
> This is all really good analysis, I think, but this seems like the key
> finding. It seems like we don't really understand what's actually
> getting written. Whether we use hash or sort doesn't seem like it
> should have this kind of impact on how much data gets written, and
> whether we use CP_SMALL_TLIST or project when needed doesn't seem like
> it should matter like this either.

Isn't this more or less the expected behavior in the event of
partitions that are spilled recursively? The case that Tomas tested
were mostly cases where work_mem was tiny relative to the data being
aggregated.

The following is an extract from commit 1f39bce0215 showing some stuff
added to the beginning of nodeAgg.c:

+ * We also specify a min and max number of partitions per spill. Too few might
+ * mean a lot of wasted I/O from repeated spilling of the same tuples. Too
+ * many will result in lots of memory wasted buffering the spill files (which
+ * could instead be spent on a larger hash table).
+ */
+#define HASHAGG_PARTITION_FACTOR 1.50
+#define HASHAGG_MIN_PARTITIONS 4
+#define HASHAGG_MAX_PARTITIONS 1024

-- 
Peter Geoghegan

pgsql-hackers by date:

From: Tom Lane
Date: 24 July 2020, 21:01:53
Subject: Re: Missing CFI in hlCover()?

From: Tom Lane
Date: 24 July 2020, 21:25:28
Subject: Re: Missing CFI in hlCover()?

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next