Re: [HACKERS] 200 = 199 + 1? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] 200 = 199 + 1?
Date
Msg-id 29652.1506893273@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] 200 = 199 + 1?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> I experimented a bit with the attached patch, which modifies
> eqjoinsel_semi in two ways.  First, if the number-of-distinct-values
> estimate for the inner rel is just a default rather than having any
> real basis, it replaces it with inner_rel->rows, effectively assuming
> that the inside of the IN or EXISTS is unique.

That might or might not be a good idea ...

> Second, it drops the
> fallback to selectivity 0.5 altogether, just applying the nd1 vs nd2
> heuristic all the time.

... but this probably isn't.  A review of the history shows me that
this change amounts to reverting 3f5d2fe30, which doesn't seem like
a good plan because that was fixing user-reported misbehavior.
The problem is still that the nd2/nd1 calculation can produce garbage
if nd2 or nd1 is made-up.  It's possible that we can get away with using
nd2 = inner_rel->rows as a suitable substitute for a default nd2, but
I'm much less happy to assume something like that for nd1.

Now, there's still not much to defend the 0.5 selectivity in particular;
according to the commit log for 3f5d2fe30, I used that because it
reproduced the behavior of previous versions that didn't understand what
a semijoin was at all.  So we could perhaps substitute some other rule
there, but I don't know what.

I also note that the precise behavior of HEAD in this area only dates
back to ca5f88502, which hasn't even shipped in a release yet, so it
surely doesn't have a lot of field experience justifying it.

Other useful-though-not-terribly-recent discussions in this area include

https://www.postgresql.org/message-id/flat/201201112110.40403.andres%40anarazel.de

https://www.postgresql.org/message-id/13290.1335976455@sss.pgh.pa.us

Given that eqjoinsel_semi has been more or less a constant source of
issues, maybe we need to think about a wholesale redesign rather than
just incremental tweaking.  But I have few ideas about what would be
better.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: [HACKERS] [PATCH] Off-by-one error in logical slot resource retention
Next
From: Andrew Dunstan
Date:
Subject: Re: [HACKERS] pg_ctl kill support for KILL signal was Re:[COMMITTERS] pgsql: Add test for postmaster crash restarts.