Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From Tender Wang
Subject Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date
Msg-id CAHewXNnYQSCRQ9PaQyViBEB6UKC08nqCzE6YjNcZxuvbThRBgg@mail.gmail.com
Whole thread
In response to Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
List pgsql-hackers
Hi all,
>Yeah, in my last email, I said I tried this way. But I worried that
>rel->tuples may be zero for an empty relation.
In my previous email, I worried rel->tuples may be zero for an empty relation.
But here it's safe, because an empty relation has no tuples in pg_statistic.
So it will not enter if (HeapTupleIsValid(vardata.statsTuple)).
Sorry for the noise.

Tom Lane <tgl@sss.pgh.pa.us> 于2026年3月1日周日 08:08写道:


> Hmm ... doesn't this contradict your argument that avgfreq and
> mcv_freq need to be calculated on the same basis?  Admittedly
> that was just a heuristic, but I'm not seeing why it's wrong.
>

Agree

> > The reason for this is that estfract is calculated as:
> >     estfract = 1.0 / ndistinct;
> > where ndistinct has been adjusted to account for restriction clauses.
> > Therefore, we must also use the adjusted avgfreq when adjusting
> > estfract here:
>
> It feels like that might end up double-counting the effects of
> the restriction clauses.
>
> Anyway, we all seem to agree that s/rel->rows/rel->tuples/ is the
> correct fix for a newly-introduced bug.  I'm inclined to proceed
> by committing that fix (along with any regression test fallout)
> and then investigating the avgfreq change as an independent matter.

+1


--
Thanks,
Tender Wang



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Skipping schema changes in publication
Next
From: Tom Lane
Date:
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq