Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200627104141.gq7d3hm2tvoqgjjs@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Default setting for enable_hashagg_disk
List pgsql-hackers
On Fri, Jun 26, 2020 at 05:24:36PM -0700, Peter Geoghegan wrote:
>On Fri, Jun 26, 2020 at 4:59 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> I agree larger work_mem for hashagg (and thus less spilling) may mean
>> lower work_mem for so some other nodes that are less sensitive to this.
>> But I think this needs to be formulated as a cost-based decision,
>> although I don't know how to do that for the reasons I explained before
>> (bottom-up plan construction vs. distributing the memory budget).
>
>Why do you think that it needs to be formulated as a cost-based
>decision? That's probably true of a scheme that allocates memory very
>intelligently, but what about an approach that's slightly better than
>work_mem?
>

Well, there are multiple ideas discussed in this (sub)thread, one of
them being a per-query memory limit. That requires decisions how much
memory should different nodes get, which I think would need to be
cost-based.

>What problems do you foresee (if any) with adding a hash_mem GUC that
>gets used for both planning and execution for hash aggregate and hash
>join nodes, in about the same way as work_mem is now?
>

Of course, a simpler scheme like this would not require that. And maybe
introducing hash_mem is a good idea - I'm not particularly opposed to
that, actually. But I think we should not introduce separate memory
limits for each node type, which was also mentioned earlier.

The problem of course is that hash_mem does not really solve the issue
discussed at the beginning of this thread, i.e. regressions due to
underestimates and unexpected spilling at execution time.

The thread is getting a rather confusing mix of proposals how to fix
that for v13 and proposals how to improve our configuration of memory
limits :-(

>> FWIW some databases already do something like this - SQL Server has
>> something called "memory grant" which I think mostly does what you
>> described here.
>
>Same is true of Oracle. But Oracle also has simple work_mem-like
>settings for sorting and hashing. People don't really use them anymore,
>but apparently it was once common for the DBA to explicitly give over
>more memory to hashing -- much like the hash_mem setting I asked about.
>IIRC the same is true of DB2.
>

Interesting. What is not entirely clear to me how do these databases
decide how much should each node get during planning. With the separate
work_mem-like settings it's fairly obvious, but how do they do that with
the global limit (either per-instance or per-query)?

>> The difference between sort and hashagg spills is that for sorts
>> there is no behavior change. Plans that did (not) spill in v12 will
>> behave the same way on v13, modulo some random perturbation. For
>> hashagg that's not the case - some queries that did not spill before
>> will spill now.
>>
>> So even if the hashagg spills are roughly equal to sort spills, both
>> are significantly more expensive than not spilling.
>
>Just to make sure we're on the same page: both are significantly more
>expensive than a hash aggregate not spilling *specifically*. OTOH, a
>group aggregate may not be much slower when it spills compared to an
>in-memory sort group aggregate. It may even be noticeably faster, due
>to caching effects, as you mentioned at one point upthread.
>
>This is the property that makes hash aggregate special, and justifies
>giving it more memory than other nodes on a system-wide basis (the same
>thing applies to hash join). This could even work as a multiplier of
>work_mem.
>

Yes, I agree.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Amit Kapila
Date:
Subject: Re: pgsql: Enable Unix-domain sockets support on Windows