Home > mailing lists

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date	March 4 23:50:36
Msg-id	1657589.1772657436@sss.pgh.pa.us Whole thread Raw
In response to	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq ("Joel Jacobson" <joel@compiler.org>)
Responses	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
List	pgsql-hackers

Tree view

"Joel Jacobson" <joel@compiler.org> writes:
> On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
>> This reminds me of the unfinished business at [1].  We really ought
>> to make it true that nulls never get into the hash table before
>> we assume that's so in costing.

> Hmm, OK, so there are cases when we don't discard NULLs when we should
> be able to? I was reading these lines in nodeHash.c and thought we would
> always be discarding them when possible:

>         if (!isnull)
>         {
> ...
>         }
>         else if (node->keep_null_tuples)
>         {
>             /* null join key, but we must save tuple to be emitted later */
> ...
>         }
>         /* else we can discard the tuple immediately */

I'm confused ... that keep_null_tuples bit appears nowhere in HEAD,
but it does appear in the patch at [1].

Anyway, the short answer is that we discard NULLs if possible, but
it's not possible when doing an outer join that requires returning
null-extended rows from the hashed side.

I've now pushed the patch we were discussing before, and all that's
left to worry about (AFAIK) in estimate_hash_bucket_stats is its
handling of null join keys.  I'd prefer to get the other patch
in before worrying more about that.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3061845.1746486714%40sss.pgh.pa.us

pgsql-hackers by date:

From: Zsolt Parragi
Date: 04 March, 23:40:34
Subject: Re: Improve OAuth discovery logging

From: Andrew Dunstan
Date: 04 March, 23:54:48
Subject: Re: Non-text mode for pg_dumpall

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

Previous

Next