Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Default setting for enable_hashagg_disk |
Date | |
Msg-id | CA+TgmoagMLfruY4y_40oKLmFFdBRVSGraZsTa5ss9wH546EYhQ@mail.gmail.com Whole thread Raw |
In response to | Re: Default setting for enable_hashagg_disk (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Default setting for enable_hashagg_disk
Re: Default setting for enable_hashagg_disk |
List | pgsql-hackers |
On Wed, Jun 10, 2020 at 2:39 PM Jeff Davis <pgsql@j-davis.com> wrote: > The behavior in v13 master is, by default, analagous to Sort or > anything else that adapts at runtime to spill. If we had spillable > HashAgg the whole time, we wouldn't be worried about #2 at all. But, > out of conservatism, I am trying to accommodate users who want an > escape hatch, at least for a release or two until users feel more > comfortable with disk-based HashAgg. > > Setting enable_hash_disk=false implements 2(a). This name apparently > causes confusion, but it's hard to come up with a better one because > the v12 behavior has nuance that's hard to express succinctly. I don't > think the names you suggested quite fit, but the idea to use a more > interesting GUC value might help express the behavior. Perhaps making > enable_hashagg a ternary "enable_hashagg=on|off|avoid_disk"? The word > "reject" is too definite for the planner, which is working with > imperfect information. > > In master, there is no explicit way to get 2(b), but you can just set > work_mem higher in a lot of cases. If enough people want 2(b), I can > add it easily. Perhaps hashagg_overflow=on|off, which would control > execution time behavior? Planner GUCs are a pretty blunt instrument for solving problems that users may have with planner features. There's no guarantee that the experience a user has with one query will be the same as the experience they have with another query, or even that you couldn't have a single query which contains two different nodes where the optimal behavior is different for one than it is for the other. In the first case, changing the value of the GUC on a per-query basis is pretty painful; in the second case, even that is not good enough. So, as Tom has said before, the only really good choice in a case like this is for the planner to figure out the right things automatically; anything that boils down to a user-provided hint pretty well sucks. So I feel like the really important thing here is to fix the cases that don't come out well with default settings. If we can't do that, then the feature is half-baked and maybe should not have been committed in the first place. If we can, then we don't really need the GUC, let alone multiple GUCs. I understand that some of the reason you added these was out of paranoia, and I get that: it's hard to be sure that any feature of this complexity isn't going to have some rough patches, especially given how defective work_mem is as a model in general. Still, we don't want to end up with 50 planner GUCs enabling and disabling individual bits of various planner nodes, or at least I don't think we do, so I'm very skeptical of the idea that we need 2 just for this feature. That doesn't feel scalable. I think the right number is 0 or 1, and if it's 1, very few people should be changing the default. If anything else is the case, then IMHO the feature isn't ready to ship. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: