Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date
Msg-id 1657589.1772657436@sss.pgh.pa.us
Whole thread Raw
In response to Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
List pgsql-hackers
"Joel Jacobson" <joel@compiler.org> writes:
> On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
>> This reminds me of the unfinished business at [1].  We really ought
>> to make it true that nulls never get into the hash table before
>> we assume that's so in costing.

> Hmm, OK, so there are cases when we don't discard NULLs when we should
> be able to? I was reading these lines in nodeHash.c and thought we would
> always be discarding them when possible:

>         if (!isnull)
>         {
> ...
>         }
>         else if (node->keep_null_tuples)
>         {
>             /* null join key, but we must save tuple to be emitted later */
> ...
>         }
>         /* else we can discard the tuple immediately */

I'm confused ... that keep_null_tuples bit appears nowhere in HEAD,
but it does appear in the patch at [1].

Anyway, the short answer is that we discard NULLs if possible, but
it's not possible when doing an outer join that requires returning
null-extended rows from the hashed side.

I've now pushed the patch we were discussing before, and all that's
left to worry about (AFAIK) in estimate_hash_bucket_stats is its
handling of null join keys.  I'd prefer to get the other patch
in before worrying more about that.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3061845.1746486714%40sss.pgh.pa.us



pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: Improve OAuth discovery logging
Next
From: Andrew Dunstan
Date:
Subject: Re: Non-text mode for pg_dumpall