Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id CAKFQuwYjJMvXST2nbVQZwdAiqCZMFRFP=Bxx0kbkBbvsdh4EXA@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Default setting for enable_hashagg_disk  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
On Fri, Jul 10, 2020 at 6:19 PM David Rowley <dgrowleyml@gmail.com> wrote:
If we have to have a new GUC, my preference would be hashagg_mem,
where -1 means use work_mem and a value between 64 and MAX_KILOBYTES
would mean use that value.  We'd need some sort of check hook to
disallow 0-63. I really am just failing to comprehend why we're
contemplating changing the behaviour of Hash Join here.

If we add a setting that defaults to work_mem then the benefit is severely reduced.  You still have to modify individual queries, but the change can simply be more targeted than changing work_mem alone.  I truly desire to have whatever we do provide that ability as well as a default value that is greater than the current work_mem value - which in v12 was being ignored and thus production usages saw memory consumption greater than work_mem.  Only a multiplier does this.  A multiplier-only solution fixes the problem at hand.  A multiplier-or-memory solution adds complexity but provides flexibility.  If adding that flexibility is straight-forward I don't see any serious downside other than the complexity of having the meaning of a single GUC's value dependent upon its magnitude.

Of course, I
understand that that node type also uses a hash table, but why does
that give it the right to be involved in a change that we're making to
try and give users the ability to avoid possible regressions with Hash
Agg?

If Hash Join isn't affected by the "was allowed to use unlimited amounts of execution memory but now isn't" change then it probably should continue to consult work_mem instead of being changed to use the calculated value (work_mem x multiplier).

David J.

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_regress cleans up tablespace twice.
Next
From: Soumyadeep Chakraborty
Date:
Subject: Re: Does TupleQueueReaderNext() really need to copy its result?