Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id d5af7930265b8c4bbda5381364d7e21955597538.camel@j-davis.com
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Default setting for enable_hashagg_disk  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sat, 2020-07-25 at 11:05 -0700, Peter Geoghegan wrote:
> What worries me a bit is the sharp discontinuities when spilling with
> significantly less work_mem than the "optimal" amount. For example,
> with Tomas' TPC-H query (against my smaller TPC-H dataset), I find
> that setting work_mem to 6MB looks like this:

...

>          Planned Partitions: 128  Peak Memory Usage: 6161kB  Disk
> Usage: 2478080kB  HashAgg Batches: 128

...

>          Planned Partitions: 128  Peak Memory Usage: 5393kB  Disk
> Usage: 2482152kB  HashAgg Batches: 11456

...

> My guess that this is because the
> recursive hash aggregation misbehaves in a self-similar fashion once
> a
> certain tipping point has been reached.

It looks like it might be fairly easy to use HyperLogLog as an
estimator for the recursive step. That should reduce the
overpartitioning, which I believe is the cause of this discontinuity.

It's not clear to me that overpartitioning is a real problem in this
case -- but I think the fact that it's causing confusion is enough
reason to see if we can fix it.

Regards,
    Jeff Davis





pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: hashagg slowdown due to spill changes
Next
From: Peter Geoghegan
Date:
Subject: Re: Default setting for enable_hashagg_disk