On 2023/6/17 06:46, Tom Lane wrote:
> Quan Zongliang <quanzongliang@yeah.net> writes:
>> Perhaps we should discard this (dups cnt > 1) restriction?
>
> That's not going to happen on the basis of one test case that you
> haven't even shown us. The implications of doing it are very unclear.
> In particular, I seem to recall that there are bits of logic that
> depend on the assumption that MCV entries always represent more than
> one row. The nmultiple calculation Tomas referred to may be failing
> because of that, but I'm worried about there being other places.
>
The statistics for the other table look like this:
stadistinct | 6
stanumbers1 | {0.50096667,0.49736667,0.0012}
stavalues1 | {v22,v23,v5}
The value that appears twice in the small table (v1 and v2) does not
appear here. The stadistinct's true value is 18 instead of 6 (three
values in the small table do not appear here).
When calculating the selectivity:
if (nd2 > sslot2->nvalues)
totalsel1 += unmatchfreq1 * otherfreq2 / (nd2 - sslot2->nvalues);
totalsel1 = 0
nd2 = 21
sslot2->nvalues = 2
unmatchfreq1 = 0.99990002016420476
otherfreq2 = 0.82608695328235626
result: totalsel1 = 0.043473913749706022
rows = 0.043473913749706022 * 23 * 2,000,000 = 1999800
> Basically, you're proposing a rather fundamental change in the rules
> by which Postgres has gathered statistics for decades. You need to
> bring some pretty substantial evidence to support that. The burden
> of proof is on you, not on the status quo.
>
> regards, tom lane