Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date
Msg-id 1010506.1772551866@sss.pgh.pa.us
Whole thread Raw
In response to Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
List pgsql-hackers
"Joel Jacobson" <joel@compiler.org> writes:
> On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
>> Aside: you could argue that failing to consider stanullfrac is wrong,
>> and maybe it is.  But the more I looked at this code the more
>> convinced I got that it was only partially accounting for nulls
>> anyway.  That seems like perhaps something to look into later.

> How about adjusting estfract for the null fraction before clamping?

This reminds me of the unfinished business at [1].  We really ought
to make it true that nulls never get into the hash table before
we assume that's so in costing.  One of the things I was thinking
was being overlooked is the possibility of lots of nulls bloating
whichever hash bucket they get put in --- but if they aren't put
into a bucket then it's not wrong to ignore them here.

(Strictly speaking, that's still not so with non-strict hash operators,
but those are so rare that I don't mind not accounting for them.)

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3061845.1746486714@sss.pgh.pa.us



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Fix bug in multixact Oldest*MXactId initialization and access
Next
From: Heikki Linnakangas
Date:
Subject: Re: Refactor recovery conflict signaling a little