Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date
Msg-id 53750.1772323685@sss.pgh.pa.us
Whole thread Raw
In response to Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq  ("Joel Jacobson" <joel@compiler.org>)
Responses Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
List pgsql-hackers
"Joel Jacobson" <joel@compiler.org> writes:
> On Fri, Feb 27, 2026, at 20:15, Tom Lane wrote:
>> Joel, do you want to run this to ground, and in particular
>> see if that way of fixing it passes your sanity tests?

> Challenge accepted!

Thanks!

> [...hours later...]
> My conclusion is that we still need to move avgfreq
> computation, like I suggested.

Hmm ... doesn't this contradict your argument that avgfreq and
mcv_freq need to be calculated on the same basis?  Admittedly
that was just a heuristic, but I'm not seeing why it's wrong.

> The reason for this is that estfract is calculated as:
>     estfract = 1.0 / ndistinct;
> where ndistinct has been adjusted to account for restriction clauses.
> Therefore, we must also use the adjusted avgfreq when adjusting
> estfract here:

It feels like that might end up double-counting the effects of
the restriction clauses.

Anyway, we all seem to agree that s/rel->rows/rel->tuples/ is the
correct fix for a newly-introduced bug.  I'm inclined to proceed
by committing that fix (along with any regression test fallout)
and then investigating the avgfreq change as an independent matter.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Lukas Fittl
Date:
Subject: pg_buffercache: Add per-relation summary stats
Next
From: Julien Rouhaud
Date:
Subject: Re: Cleaning up PREPARE query strings?