Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200625232316.cevmhw7cgra6bu6e@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Thu, Jun 25, 2020 at 09:42:33AM -0700, Jeff Davis wrote:
>On Wed, 2020-06-24 at 12:31 -0700, Andres Freund wrote:
>> nodeAgg.c already treats those separately:
>>
>> void
>> hash_agg_set_limits(double hashentrysize, uint64 input_groups, int
>> used_bits,
>>                     Size *mem_limit, uint64
>> *ngroups_limit,
>>                     int *num_partitions)
>> {
>>     int            npartitions;
>>     Size        partition_mem;
>>
>>     /* if not expected to spill, use all of work_mem */
>>     if (input_groups * hashentrysize < work_mem * 1024L)
>>     {
>>         if (num_partitions != NULL)
>>             *num_partitions = 0;
>>         *mem_limit = work_mem * 1024L;
>>         *ngroups_limit = *mem_limit / hashentrysize;
>>         return;
>>     }
>
>The reason this code exists is to decide how much of work_mem to set
>aside for spilling (each spill partition needs an IO buffer).
>
>The alternative would be to fix the number of partitions before
>processing a batch, which didn't seem ideal. Or, we could just ignore
>the memory required for IO buffers, like HashJoin.
>

I think the conclusion from the recent HashJoin discussions is that not
accounting for BufFiles is an issue, and we want to fix it. So repeating
that for HashAgg would be a mistake, IMHO.

>Granted, this is an example where an underestimate can give an
>advantage, but I don't think we want to extend the concept into other
>areas.
>

I agree.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Alvaro Herrera
Date:
Subject: Re: min_safe_lsn column in pg_replication_slots view