Home > mailing lists

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

From	Tender Wang
Subject	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date	February 28 02:57:22
Msg-id	CAHewXNknzZ0+trb0RzvTBpzr9R+42W0xLYzDDsueJ11N1mbqNg@mail.gmail.com Whole thread
In response to	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane <tgl@sss.pgh.pa.us> 于2026年2月28日周六 03:15写道：
>
> Tender Wang <tndrwang@gmail.com> writes:
> > I added Tom to the cc list. He may know more about this.
>
> Hmm, git blame says I originated this function 25 years ago
> (f905d65ee), but I don't claim to remember that.
>
> Looking at it now, though, I think that bd3e3e9e5 is indeed
> wrong but not in the way Joel suggests.  The longstanding
> way to compute mcv_freq is
>
>             /*
>              * The first MCV stat is for the most common value.
>              */
>             if (sslot.nnumbers > 0)
>                 *mcv_freq = sslot.numbers[0];
>
> *This number is a fraction measured on the raw relation.*
> (Necessarily so, because it's just a number computed by ANALYZE.)
> Then bd3e3e9e5 added
>
>             /*
>              * If there are no recorded MCVs, but we do have a histogram, then
>              * assume that ANALYZE determined that the column is unique.
>              */
>             if (vardata.rel && vardata.rel->rows > 0)
>                 *mcv_freq = 1.0 / vardata.rel->rows;
>
> This is a pure thinko.  rel->rows is the estimated number
> of filtered rows.  What I should have used is rel->tuples,
> which is the estimated raw relation size, so as to get a
> number that is commensurate with the longstanding way
> of calculating mcv_freq.  Then that also matches up with
> computing avgfreq on the raw relation.
>
> So I think the correct fix is basically
>
> -            if (vardata.rel && vardata.rel->rows > 0)
> -                *mcv_freq = 1.0 / vardata.rel->rows;
> +            if (vardata.rel && vardata.rel->tuples > 0)
> +                *mcv_freq = 1.0 / vardata.rel->tuples;
>

Yeah, in my last email, I said I tried this way. But I worried that
rel->tuples may be zero for an empty relation.

> and I wonder if that will wind up in reverting a lot of the plan
> choice changes seen in bd3e3e9e5.

Yes, a lot plan diff in partition_join.sql.



--
Thanks,
Tender Wang

pgsql-hackers by date:

From: Melanie Plageman
Date: 28 February, 02:56:48
Subject: Re: All-visible pages with valid prune xid are confusing

From: Chao Li
Date: 28 February, 03:12:03
Subject: Re: doc: Clarify that empty COMMENT string removes the comment

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq - Mailing list pgsql-hackers

Previous

Next