Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200710141714.GI12375@tamriel.snowman.net
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Greetings,

* Peter Geoghegan (pg@bowt.ie) wrote:
> On Thu, Jul 9, 2020 at 5:08 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I didn't, and don't, think it particularly relevant to the discussion,
> > but if you don't like the comparison to Sort then we could compare it to
> > a HashJoin instead- the point is that, yes, if you are willing to give
> > more memory to a given operation, it's going to go faster, and the way
> > users tell us that they'd like the query to use more memory is already
> > well-defined and understood to be through work_mem.  We do not do them a
> > service by ignoring that.
>
> The hash_mem design (as it stands) would affect both hash join and
> hash aggregate. I believe that it makes most sense to have hash-based
> nodes under the control of a single GUC. I believe that this
> granularity will cause the least problems. It certainly represents a
> trade-off.

So, now this has moved from being a hack to deal with a possible
regression for a small number of users due to new behavior in one node,
to a change that has impacts on other nodes that hadn't been changed,
all happening during beta.

No, I don't agree with this.  Now is not the time for designing new
features for v13.

> work_mem is less of a problem with hash join, primarily because join
> estimates are usually a lot better than grouping estimates. But it is
> nevertheless something that it makes sense to put in the same
> conceptual bucket as hash aggregate, pending a future root and branch
> redesign of work_mem.

I'm still not thrilled with the 'hash_mem' kind of idea as it's going in
the wrong direction because what's actually needed is a way to properly
consider and track overall memory usage- a redesign of work_mem (or some
new parameter, but it wouldn't be 'hash_mem') as you say, but all of
this discussion should be targeting v14.

> Like I said, the escape hatch GUC is not my preferred solution. But at
> least it acknowledges the problem. I don't think that anyone (or
> anyone else) believes that work_mem doesn't have serious limitations.

work_mem obviously has serious limitations, but that's not novel or new
or unexpected by anyone.

> > We have a parameter which already drives this and which users are
> > welcome to (and quite often do) tune.  I disagree that anything further
> > is either essential or particularly desirable.
>
> This is a user hostile attitude.

I don't find that argument convincing, at all.

> > I'm really rather astounded at the direction this has been going in.
>
> Why?

Due to the fact that we're in beta and now is not the time to be
redesigning this feature.  What Jeff implemented was done in a way that
follows the existing structure for how all of the other nodes work and
how HashAgg was *intended* to work (as in- if we thought the HashAgg
would go over work_mem, we wouldn't pick it and would do a GroupAgg
instead).  If there's bugs in his implementation (which I doubt, but it
can happen, of course) then that'd be useful to discuss and look at
fixing, but this discussion isn't appropriate for beta.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: distribute_restrictinfo_to_rels if restrictinfo contains volatile functions
Next
From: Sascha Kuhl
Date:
Subject: Re: WIP: BRIN multi-range indexes