Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From Joel Jacobson
Subject Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date
Msg-id 4a597843-0dc8-4fd0-a219-91d3d01490b5@app.fastmail.com
Whole thread Raw
In response to Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
List pgsql-hackers
On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
> "Joel Jacobson" <joel@compiler.org> writes:
>> On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
>>> Aside: you could argue that failing to consider stanullfrac is wrong,
>>> and maybe it is.  But the more I looked at this code the more
>>> convinced I got that it was only partially accounting for nulls
>>> anyway.  That seems like perhaps something to look into later.
>
>> How about adjusting estfract for the null fraction before clamping?
>
> This reminds me of the unfinished business at [1].  We really ought
> to make it true that nulls never get into the hash table before
> we assume that's so in costing.  One of the things I was thinking
> was being overlooked is the possibility of lots of nulls bloating
> whichever hash bucket they get put in --- but if they aren't put
> into a bucket then it's not wrong to ignore them here.
>
> (Strictly speaking, that's still not so with non-strict hash operators,
> but those are so rare that I don't mind not accounting for them.)
>
>             regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/3061845.1746486714@sss.pgh.pa.us

Hmm, OK, so there are cases when we don't discard NULLs when we should
be able to? I was reading these lines in nodeHash.c and thought we would
always be discarding them when possible:

        if (!isnull)
        {
...
        }
        else if (node->keep_null_tuples)
        {
            /* null join key, but we must save tuple to be emitted later */
...
        }
        /* else we can discard the tuple immediately */

Thanks for the pointer to [1], I will dig into that thread, exciting!

/Joel



pgsql-hackers by date:

Previous
From: David Geier
Date:
Subject: Re: Reduce build times of pg_trgm GIN indexes
Next
From: Florin Irion
Date:
Subject: UBSAN crash in EventTriggerCollectAlterTSConfig (memcpy with NULL src)