Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id CA+TgmoagMLfruY4y_40oKLmFFdBRVSGraZsTa5ss9wH546EYhQ@mail.gmail.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Default setting for enable_hashagg_disk
Re: Default setting for enable_hashagg_disk
List pgsql-hackers
On Wed, Jun 10, 2020 at 2:39 PM Jeff Davis <pgsql@j-davis.com> wrote:
> The behavior in v13 master is, by default, analagous to Sort or
> anything else that adapts at runtime to spill. If we had spillable
> HashAgg the whole time, we wouldn't be worried about #2 at all. But,
> out of conservatism, I am trying to accommodate users who want an
> escape hatch, at least for a release or two until users feel more
> comfortable with disk-based HashAgg.
>
> Setting enable_hash_disk=false implements 2(a). This name apparently
> causes confusion, but it's hard to come up with a better one because
> the v12 behavior has nuance that's hard to express succinctly. I don't
> think the names you suggested quite fit, but the idea to use a more
> interesting GUC value might help express the behavior. Perhaps making
> enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word
> "reject" is too definite for the planner, which is working with
> imperfect information.
>
> In master, there is no explicit way to get 2(b), but you can just set
> work_mem higher in a lot of cases. If enough people want 2(b), I can
> add it easily. Perhaps hashagg_overflow=on|off, which would control
> execution time behavior?

Planner GUCs are a pretty blunt instrument for solving problems that
users may have with planner features. There's no guarantee that the
experience a user has with one query will be the same as the
experience they have with another query, or even that you couldn't
have a single query which contains two different nodes where the
optimal behavior is different for one than it is for the other. In the
first case, changing the value of the GUC on a per-query basis is
pretty painful; in the second case, even that is not good enough. So,
as Tom has said before, the only really good choice in a case like
this is for the planner to figure out the right things automatically;
anything that boils down to a user-provided hint pretty well sucks.

So I feel like the really important thing here is to fix the cases
that don't come out well with default settings. If we can't do that,
then the feature is half-baked and maybe should not have been
committed in the first place. If we can, then we don't really need the
GUC, let alone multiple GUCs. I understand that some of the reason you
added these was out of paranoia, and I get that: it's hard to be sure
that any feature of this complexity isn't going to have some rough
patches, especially given how defective work_mem is as a model in
general. Still, we don't want to end up with 50 planner GUCs enabling
and disabling individual bits of various planner nodes, or at least I
don't think we do, so I'm very skeptical of the idea that we need 2
just for this feature. That doesn't feel scalable. I think the right
number is 0 or 1, and if it's 1, very few people should be changing
the default. If anything else is the case, then IMHO the feature isn't
ready to ship.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: suggest to rename enable_incrementalsort
Next
From: Bruce Momjian
Date:
Subject: Re: Global snapshots