Re: [HACKERS] New design for FK-based join selectivity estimation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] New design for FK-based join selectivity estimation
Date
Msg-id 23545.1481658247@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] New design for FK-based join selectivity estimation  (ronan.dunklau@dalibo.com)
Responses Re: [HACKERS] New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
ronan.dunklau@dalibo.com writes:
> On mardi 13 décembre 2016 09:10:47 CET Adrien Nayrat wrote:
>> The commit 100340e2dcd05d6505082a8fe343fb2ef2fa5b2a introduce an
>> estimation error :

> The problem is, for semi and anti joins, we assume that we have nohing to do
> (costsize.c:4253):

>         else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
>         {
>             /*
>              * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
>              * fraction of LHS rows that have matches.  If the referenced
>              * table is on the inner side, that means the selectivity is 1.0
>              * (modulo nulls, which we're ignoring for now).  We already
>              * covered the other case, so no work here.
>              */
>         }

> This results in assuming that the whole outerrel will match, no matter the
> selectivity of the innerrel.

Yeah.  In the terms of this example, the FK means that every outer row
would have a match, if the query were
select * from t3 where j in (select * from t4);

But actually it's
select * from t3 where j in (select * from t4 where j<10);

so of course we should not expect a match for every row.

> If I understand it correctly and the above is right, I think we should ignore
> SEMI or ANTI joins altogether when considering FKs, and keep the corresponding
> restrictinfos for later processing since they are already special-cased later
> on.

That seems like an overreaction.  While the old code happens to get this
example exactly right, eqjoinsel_semi is still full of assumptions and
approximations, and it doesn't do very well at all if it lacks MCV lists
for both sides.

I'm inclined to think that what we want to have happen in this case is
to estimate the fraction of outer rows having a match as equal to the
selectivity of the inner query's WHERE clauses, ie the semijoin
selectivity should be sizeof(inner result) divided by sizeof(inner
relation).
        regards, tom lane



pgsql-hackers by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraintviolation [and 2 more messages]
Next
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Logical Replication WIP