Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200624090628.GA19640@momjian.us
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: Default setting for enable_hashagg_disk  (David Rowley <dgrowleyml@gmail.com>)
Re: Default setting for enable_hashagg_disk  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Wed, Jun 24, 2020 at 02:11:57PM +1200, David Rowley wrote:
> On Tue, 23 Jun 2020 at 08:24, Jeff Davis <pgsql@j-davis.com> wrote:
> > Another way of looking at it is that the weird behavior is already
> > there in v12, so there are already users relying on this weird behavior
> > as a crutch for some other planner mistake. The question is whether we
> > want to:
> >
> > (a) take the weird behavior away now as a consequence of implementing
> > disk-based HashAgg; or
> > (b) support the weird behavior forever; or
> > (c) introduce a GUC now to help transition away from the weird behavior
> >
> > The danger with (c) is that it gives users more time to become more
> > reliant on the weird behavior; and worse, a GUC could be seen as an
> > endorsement of the weird behavior rather than a path to eliminating it.
> > So we could intend to do (c) and end up with (b). We can mitigate this
> > with documentation warnings, perhaps.
> 
> So, I have a few thoughts on this subject. I understand both problem
> cases have been mentioned before on this thread, but just to reiterate
> the two problem cases that we really would rather people didn't hit.

I appreciated this summary since I wasn't fully following the issues.

> As for GUCs to try to help the group of users who, *I'm certain*, will
> have problems with PG13's plan choice. I think the overloaded
> enable_hashagg option is a really nice compromise.   We don't really
> have any other executor node type that has multiple GUCs controlling
> its behaviour, so I believe it would be nice to keep it that way.

So, in trying to anticipate how users will be affected by an API change,
I try to look at similar cases where we already have this behavior, and
how users react to this.  Looking at the available join methods, I think
we have one.  We currently support:

    * nested loop with sequential scan
    * nested loop with index scan
    * hash join
    * merge join

It would seem merge join has almost the same complexities as the new
hash join code, since it can spill to disk doing sorts for merge joins,
and adjusting work_mem is the only way to control that spill to disk.  I
don't remember anyone complaining about spills to disk during merge
join, so I am unclear why we would need a such control for hash join.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




pgsql-hackers by date:

Previous
From: ROS Didier
Date:
Subject: PostgreSQL and big data - FDW
Next
From: Amit Kapila
Date:
Subject: Re: EXPLAIN: Non-parallel ancestor plan nodes exclude parallel worker instrumentation