Re: Default setting for enable_hashagg_disk (hash_mem) - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: Default setting for enable_hashagg_disk (hash_mem)
Date
Msg-id CAFj8pRBf1w4ndz-ynd+mUpTfiZfbs7+CPjc4ob8v9d3X0MscCg@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk (hash_mem)  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers


út 7. 7. 2020 v 14:55 odesílatel David Rowley <dgrowleyml@gmail.com> napsal:
On Tue, 7 Jul 2020 at 16:57, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Sun, 2020-07-05 at 16:47 -0700, Peter Geoghegan wrote:
> > Where does that leave the hash_mem idea (or some other similar
> > proposal)?
>
> hash_mem is acceptable to me if the consensus is moving toward that,
> but I'm not excited about it.

FWIW, I'm not a fan of the hash_mem idea. It was my impression that we
aimed to provide an escape hatch for people we have become accustomed
to <= PG12 behaviour and hash_mem sounds like it's not that. Surely a
GUC by that name would control what Hash Join does too?  Otherwise, it
would be called hashagg_mem. I'd say changing the behaviour of Hash
join is not well aligned to the goal of allowing users to get
something closer to what PG12 did.

I know there has been talk over the years to improve how work_mem
works. I see Tomas mentioned memory grants on the other thread [1]. I
do imagine this is the long term solution to the problem where users
must choose very conservative values for work_mem. We're certainly not
going to get that for PG13, so I do think what we need here is just a
simple escape hatch. I mentioned my thoughts in [2], so won't go over
it again here. Once we've improved the situation in some future
version of postgres, perhaps along the lines of what Tomas mentioned,
then we can get rid of the escape hatch.

Here are my reasons for not liking the hash_mem idea:

1. if it also increases the amount of memory that Hash Join can use
then that makes the partition-wise hash join problem of hash_mem *
npartitions even bigger when users choose to set hash_mem higher than
work_mem to get Hash Agg doing what they're used to.
2. Someone will one day ask for sort_mem and then materialize_mem.
Maybe then cte_mem. Once those are done we might as well just add a
GUC to control every executor node that uses work_mem.
3. I'm working on a Result cache node [3]. It uses a hash table
internally. Should it constraint its memory consumption according to
hash_mem or work_mem? It's not really that obvious to people that it
internally uses a hash table. "Hash" does not appear in the node name.
Do people need to look that up in the documents?

+1

I share your opinion.



David

[1] https://www.postgresql.org/message-id/20200626235850.gvl3lpfyeobu4evi@development
[2] https://www.postgresql.org/message-id/CAApHDvqFZikXhAGW=UKZKq1_FzHy+XzmUzAJiNj6RWyTHH4UfA@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com


pgsql-hackers by date:

Previous
From: ahsan hadi
Date:
Subject: Re: Added tab completion for the missing options in copy statement
Next
From: Tom Lane
Date:
Subject: Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks