Home > mailing lists

Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Memory-Bounded Hash Aggregation
Date	March 27, 2020 01:31:08
Msg-id	20200327013108.tiskardwkqr5otg6@development Whole thread Raw
In response to	Re: Memory-Bounded Hash Aggregation (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: Memory-Bounded Hash Aggregation
List	pgsql-hackers

Tree view

On Thu, Mar 26, 2020 at 05:56:56PM +0800, Richard Guo wrote:
>Hello,
>
>When calculating the disk costs of hash aggregation that spills to disk,
>there is something wrong with how we determine depth:
>
>>            depth = ceil( log(nbatches - 1) / log(num_partitions) );
>
>If nbatches is some number between 1.0 and 2.0, we would have a negative
>depth. As a result, we may have a negative cost for hash aggregation
>plan node, as described in [1].
>
>I don't think 'log(nbatches - 1)' is what we want here. Should it be
>just '(nbatches - 1)'?
>

I think using log() is correct, but why should we allow fractional
nbatches values between 1.0 and 2.0? You either have 1 batch or 2
batches, you can't have 1.5 batches. So I think the issue is here

   nbatches = Max((numGroups * hashentrysize) / mem_limit,
                  numGroups / ngroups_limit );

and we should probably do

   nbatches = ceil(nbatches);

right after it.



regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tom Lane
Date: 27 March 2020, 01:26:49
Subject: Race condition in SyncRepGetSyncStandbysPriority

From: Fujii Masao
Date: 27 March 2020, 01:32:39
Subject: Re: Some problems of recovery conflict wait events

Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

Previous

Next