Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> writes:
> On 27.10.2025 18:50, David Geier wrote:
>> On 10.09.2025 15:56, Ilia Evdokimov wrote:
>>> I'll need some time to check both join and semi join cases with small
>>> and large default_statistics_target. I'll share the results later.
>> Have you had a chance to test above things?
> Yes, I wrote about this here:
> https://www.postgresql.org/message-id/c3dbf2ab-d72d-4033-822a-60ad8023f499%40tantorlabs.com
Hmm.  Those results sure look like there is a performance regression
up to at least 100 MCVs ... not a large one, but consistently a few
percent.  That's a bit sad for a patch purporting to improve
performance.  It looks to me like perhaps we should stick to the old
algorithm up to 100 or possibly even more MCVs.  Certainly the
threshold needs to be higher than 1, as you have it now.
I eyeballed the patch itself very briefly, and have a couple
quick comments:
* Is hash_msv a typo for hash_mcv?  If not, maybe spell out what
it's supposed to mean.
* The patch would be easier to read if it didn't reindent a couple
large chunks of existing code.  Can we change the factorization
to avoid that?  If not, I'd recommend submitting without that
reindentation, and reminding the committer to reindent at the last
moment.
* The calculation loops in eqjoinsel_inner and eqjoinsel_semi
are not identical, which makes it look quite weird to be
writing just one function that conditionally replaces both.
I wonder if we should refactor to have just one copy (and
just eat the extra cycles of calculating matchprodfreq).
* In fact ... I wonder if we should try harder to not do essentially
identical calculations twice, remembering that eqjoinsel_semi is
always used alongside eqjoinsel_inner.  (Of course, we could only do
that if clamped_nvalues2 is the same as sslot2->nvalues, but that's
frequently true.)  I think the reason it's duplicative right now
is that we regarded this semijoin calculation as an experiment and
so didn't want to invest a lot of effort in it ... but this patch
is exactly a lot of effort, so maybe it's time to deal with that
unfinished business.
            regards, tom lane