Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200627000516.GF3337@momjian.us
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sat, Jun 27, 2020 at 01:58:50AM +0200, Tomas Vondra wrote:
> > Since work_mem affect the optimizer choices, I can imagine it getting
> > complex since nodes would have to ask the global work_mem allocator how
> > much memory it _might_ get, but then ask for final work_mem during
> > execution, and they might differ.  Still, our spill costs are so high
> > for so many node types, that reducing spills seems like it would be a
> > win, even if it sometimes causes poorer plans.
> > 
> 
> I may not understand what you mean by "poorer plans" here, but I find it
> hard to accept that reducing spills is generally worth poorer plans.

We might cost a plan based on a work_mem that the global allocator
things it will give us when we are in the executor, but that might
change when we are in the executor.  We could code is to an optimizer
request is always honored in the executor, but prepared plans would be a
problem, or perhaps already are if you prepare a plan and change
work_mem before EXECUTE.

> I agree larger work_mem for hashagg (and thus less spilling) may mean
> lower work_mem for so some other nodes that are less sensitive to this.
> But I think this needs to be formulated as a cost-based decision,
> although I don't know how to do that for the reasons I explained before
> (bottom-up plan construction vs. distributing the memory budget).
> 
> FWIW some databases already do something like this - SQL Server has
> something called "memory grant" which I think mostly does what you
> described here.

Yep, something like that.

> > > > Also, doesn't this blog entry also show that spiling to disk for ORDER
> > > > BY is similarly slow compared to hash aggs?
> > > >
> > > >     https://momjian.us/main/blogs/pgblog/2012.html#February_2_2012
> > > 
> > > The post does not mention hashagg at all, so I'm not sure how could it
> > > show that? But I think you're right the spilling itself is not that far
> > > away, in most cases (thanks to the recent fixes made by Jeff).
> > 
> > Yeah, I was just measuring ORDER BY spill, but it seems to be a similar
> > overhead to hashagg spill, which is being singled out in this discussion
> > as particularly expensive, and I am questioning that.
> > 
> 
> The difference between sort and hashagg spills is that for sorts there
> is no behavior change. Plans that did (not) spill in v12 will behave the
> same way on v13, modulo some random perturbation. For hashagg that's not
> the case - some queries that did not spill before will spill now.

Well, my point is that we already had ORDER BY problems, and if hash agg
now has them too in PG 13, I am fine with that.  We don't guarantee no
problems in major versions.  If we want to add a general knob that says,
"Hey allow this node to exceed work_mem by X%," I don't see the point
--- just increase work_mem, or have different work_mem settings for
different node types, as I outlined previously.

> So even if the hashagg spills are roughly equal to sort spills, both are
> significantly more expensive than not spilling.

Yes, but that means we need a more general fix and worrying about hash
agg is not addressing the core issue.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee




pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Peter Geoghegan
Date:
Subject: Re: Default setting for enable_hashagg_disk