Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From David G. Johnston
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id CAKFQuwbnEP0cqYOyWiigyRpkcjArzR6O7YFXpra9VLAXtRkbQg@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Thu, Jul 9, 2020 at 3:58 PM Stephen Frost <sfrost@snowman.net> wrote:
> > If folks
> > want to let HashAgg use more memory then they can set work_mem higher,
> > just the same as if they want a Sort node to use more memory or a
> > HashJoin.  Yes, that comes with potential knock-on effects about other
> > nodes (possibly) using more memory but that's pretty well understood for
> > all the other cases and I don't think that it makes sense to have a
> > special case for HashAgg when the only justification is that "well, you
> > see, it used to have this bug, so...".
>
> That's not the only justification. The other justification is that
> it's generally reasonable to prefer giving hash aggregate more memory.

Sure, and it's generally reasonably to prefer giving Sorts more memory
too... as long as you've got it available.

Looking at the docs for work_mem it was decided to put "such as" before "sort" and "hash table" even though the rest of the paragraph then only talks about those two.  Are there other things possible that warrant the "such as" qualifier or can we write "specifically, a sort, or a hash table"?

For me, as a user that doesn't presently need to deal with all this, I'd rather have a multiplier GUC for max_hash_work_mem_units defaulting to something like 4.  The planner would then use that multiple.  We've closed the "bug" while still giving me a region of utility that emulates the v12 reality without me touching anything, or even being aware of the bug that is being fixed.

I cannot see myself wanting to globally revert to v12 behavior on the execution side as the OOM side-effect is definitely more unpleasant than slowed queries.  If I have to go into a specific query anyway I'd go for a measured change on the work_mem or multiplier rather than choosing to consume as much memory as needed.

David J.

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Cary Huang
Date:
Subject: Re: pg_dump --where option