Tom Lane <tgl@sss.pgh.pa.us> 于2026年2月28日周六 03:15写道:
>
> Tender Wang <tndrwang@gmail.com> writes:
> > I added Tom to the cc list. He may know more about this.
>
> Hmm, git blame says I originated this function 25 years ago
> (f905d65ee), but I don't claim to remember that.
>
> Looking at it now, though, I think that bd3e3e9e5 is indeed
> wrong but not in the way Joel suggests. The longstanding
> way to compute mcv_freq is
>
> /*
> * The first MCV stat is for the most common value.
> */
> if (sslot.nnumbers > 0)
> *mcv_freq = sslot.numbers[0];
>
> *This number is a fraction measured on the raw relation.*
> (Necessarily so, because it's just a number computed by ANALYZE.)
> Then bd3e3e9e5 added
>
> /*
> * If there are no recorded MCVs, but we do have a histogram, then
> * assume that ANALYZE determined that the column is unique.
> */
> if (vardata.rel && vardata.rel->rows > 0)
> *mcv_freq = 1.0 / vardata.rel->rows;
>
> This is a pure thinko. rel->rows is the estimated number
> of filtered rows. What I should have used is rel->tuples,
> which is the estimated raw relation size, so as to get a
> number that is commensurate with the longstanding way
> of calculating mcv_freq. Then that also matches up with
> computing avgfreq on the raw relation.
>
> So I think the correct fix is basically
>
> - if (vardata.rel && vardata.rel->rows > 0)
> - *mcv_freq = 1.0 / vardata.rel->rows;
> + if (vardata.rel && vardata.rel->tuples > 0)
> + *mcv_freq = 1.0 / vardata.rel->tuples;
>
Yeah, in my last email, I said I tried this way. But I worried that
rel->tuples may be zero for an empty relation.
> and I wonder if that will wind up in reverting a lot of the plan
> choice changes seen in bd3e3e9e5.
Yes, a lot plan diff in partition_join.sql.
--
Thanks,
Tender Wang