Home > mailing lists

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Default setting for enable_hashagg_disk
Date	July 26, 2020 00:31:49
Msg-id	CAH2-Wznidojad-zbObnFOzDA5RTCS0JLsqcZkDNu+ou1NGYQYQ@mail.gmail.com Whole thread Raw
In response to	Re: Default setting for enable_hashagg_disk (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Default setting for enable_hashagg_disk
List	pgsql-hackers

Tree view

On Sat, Jul 25, 2020 at 4:56 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I wrote a quick patch to use HyperLogLog to estimate the number of
> groups contained in a spill file. It seems to reduce the
> overpartitioning effect, and is a more principled approach than what I
> was doing before.

This pretty much fixes the issue that I observed with overparitioning.
At least in the sense that the number of partitions grows more
predictably -- even when the number of partitions planned is reduced
the change in the number of batches seems smooth-ish. It "looks nice".

> It does seem to hurt the runtime slightly when spilling to disk in some
> cases. I haven't narrowed down whether this is because we end up
> recursing multiple times, or if it's just more efficient to
> overpartition, or if the cost of doing the HLL itself is significant.

I'm glad that this better principled approach is possible. It's hard
to judge how much of a problem this really is, though. We'll need to
think about this aspect some more.

Thanks
-- 
Peter Geoghegan

pgsql-hackers by date:

From: Peter Geoghegan
Date: 26 July 2020, 00:13:00
Subject: Re: Default setting for enable_hashagg_disk

From: Peter Geoghegan
Date: 26 July 2020, 00:52:11
Subject: Re: Default setting for enable_hashagg_disk

Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

Previous

Next