Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	David G. Johnston
Subject	Re: Default setting for enable_hashagg_disk
Date	July 11, 2020 04:35:43
Msg-id	CAKFQuwYjJMvXST2nbVQZwdAiqCZMFRFP=Bxx0kbkBbvsdh4EXA@mail.gmail.com Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Default setting for enable_hashagg_disk (David Rowley <dgrowleyml@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Jul 10, 2020 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:

If we have to have a new GUC, my preference would be hashagg_mem,
where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
would mean use that value. We'd need some sort of check hook to
disallow 0-63. I really am just failing to comprehend why we're
contemplating changing the behaviour of Hash Join here.

If we add a setting that defaults to work_mem then the benefit is severely reduced. You still have to modify individual queries, but the change can simply be more targeted than changing work_mem alone. I truly desire to have whatever we do provide that ability as well as a default value that is greater than the current work_mem value - which in v12 was being ignored and thus production usages saw memory consumption greater than work_mem. Only a multiplier does this. A multiplier-only solution fixes the problem at hand. A multiplier-or-memory solution adds complexity but provides flexibility. If adding that flexibility is straight-forward I don't see any serious downside other than the complexity of having the meaning of a single GUC's value dependent upon its magnitude.

Of course, I
understand that that node type also uses a hash table, but why does
that give it the right to be involved in a change that we're making to
try and give users the ability to avoid possible regressions with Hash
Agg?

If Hash Join isn't affected by the "was allowed to use unlimited amounts of execution memory but now isn't" change then it probably should continue to consult work_mem instead of being changed to use the calculated value (work_mem x multiplier).

David J.

pgsql-hackers by date:

From: Michael Paquier
Date: 11 July 2020, 04:35:07
Subject: Re: pg_regress cleans up tablespace twice.

From: Soumyadeep Chakraborty
Date: 11 July 2020, 04:36:31
Subject: Re: Does TupleQueueReaderNext() really need to copy its result?

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next