Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Default setting for enable_hashagg_disk
Date	July 26, 2020 00:05:09
Msg-id	20200726000509.vtl4p7vzazdr2dif@development Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: Default setting for enable_hashagg_disk
List	pgsql-hackers

Tree view

On Sat, Jul 25, 2020 at 10:07:37AM -0700, Peter Geoghegan wrote:
>On Sat, Jul 25, 2020 at 9:39 AM Peter Geoghegan <pg@bowt.ie> wrote:
>> "Peak Memory Usage: 1605334kB"
>>
>> Hash agg avoids spilling entirely (so the planner gets it right this
>> time around). It even uses notably less memory.
>
>I guess that this is because the reported memory usage doesn't reflect
>the space used for transition state, which is presumably most of the
>total -- array_agg() is used in the query.
>

I'm not sure what you mean by "reported memory usage doesn't reflect the
space used for transition state"? Surely it does include that, we've
built the memory accounting stuff pretty much exactly to do that.

I think it's pretty clear what's happening - in the sorted case there's
only a single group getting new values at any moment, so when we decide
to spill we'll only add rows to that group and everything else will be
spilled to disk.

In the unsorted case however we manage to initialize all groups in the
hash table, but at that point the groups are tiny an fit into work_mem.
As we process more and more data the groups grow, but we can't evict
them - at the moment we don't have that capability. So we end up
processing everything in memory, but significantly exceeding work_mem.

FWIW all my tests are done on the same TPC-H data set clustered by
l_shipdate (so probably random with respect to other columns).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Jeff Davis
Date: 25 July 2020, 23:56:51
Subject: Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan
Date: 26 July 2020, 00:13:00
Subject: Re: Default setting for enable_hashagg_disk

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next