Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Default setting for enable_hashagg_disk
Date
Msg-id 20200629213319.ldfbi6wiyh6idf6e@development
Whole thread Raw
In response to Re: Default setting for enable_hashagg_disk  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On Mon, Jun 29, 2020 at 01:31:40PM -0400, Bruce Momjian wrote:
>On Mon, Jun 29, 2020 at 10:20:14AM -0700, Peter Geoghegan wrote:
>> I have no reason to believe that the planner is any more or any less
>> likely to conclude that the hash table will fit in memory in v13 as
>> things stand (I don't know if the BufFile issue matters).
>>
>> In general, grouping estimates probably aren't very good compared to
>> join estimates. I imagine that in either v12 or v13 the planner is
>> likely to incorrectly believe that it'll all fit in memory fairly
>> often. v12 was much too permissive about what could happen. But v13 is
>> too conservative.
>
>FYI, we have improved planner statistics estimates for years, which must
>have affected node spill behavior on many node types (except hash_agg),
>and don't remember any complaints about it.
>

I think misestimates for GROUP BY are quite common and very hard to fix.
Firstly, our ndistinct estimator may give pretty bad results depending
e.g. on how is the table correlated.

I've been running some TPC-H benchmarks, and for partsupp.ps_partkey our
estimate was 4338776, when the actual value is 15000000, i.e. ~3.5x
higher. This was with statistics target increased to 1000. I can easily
imagine even worse estimates with lower values.

This ndistinct estimator is used even for extended statistics, so that
can't quite save us. Moreover, the grouping may be on top of a join, in
which case using ndistinct coefficients may not be possible :-(

So I think this is a quite real problem ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Peter Geoghegan
Date:
Subject: Re: Default setting for enable_hashagg_disk