Re: PATCH: use foreign keys to improve join estimates v1 - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: PATCH: use foreign keys to improve join estimates v1
Date
Msg-id 555A70DA.7050501@2ndquadrant.com
Whole thread Raw
In response to Re: PATCH: use foreign keys to improve join estimates v1  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: PATCH: use foreign keys to improve join estimates v1  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
Hi David,

On 05/17/15 14:31, David Rowley wrote:
> Hi Tomas,
>
> I did glance at this patch a while back, but just thinking on it again.
>
> I think if you find any quals that are a part of *any* foreign key
> between the 2 join tables, then you should be never assume these quals
> to reduce the number of rows. I believe this should be the case even if
> you don't fully match all columns of the foreign key.
>
> If we modify your example a little, let's say your foreign key between
> fact and dim is made from 3 columns (a,b,c)
>
> If we do:
>
> EXPLAIN SELECT * FROM fact f JOIN dim d USING (a,b);
>
> Then we should always (under normal circumstances) find at least one
> matching row, although in this case since the join qual for c is
> missing, we could find more than 1 matching row.
>
> Without digging too deep here, I'd say that the best way to do this
> would be to either have calc_joinrel_size_estimate() build a list of
> restrictinfo items of all quals that are NOT part of any foreign key and
> pass that trimmed list down to clauselist_selectivity() for selectivity
> estimates. Or perhaps a better way would be just to
> teach clauselist_selectivity() about foreign keys. Likely
> clauselist_selectivity() would just have to skip over RestrictInfo items
> that are part of a foreign key.

I'm not particularly happy about the way the current patch tweaks the 
selectivity, but I think simply removing the clauses is not going to 
work, because that implies the (removed) conditions have selectivity 1.0 
(so the estimate would match a cartesian product). So we really need to 
estimate the selectivity, we can't just throw them away.

And that's essentially what the current patch does - it realizes the 
clauses match a FK, and then computes the estimate as 1/N where N is the 
cardinality of the table with PK.

Another thing is that there may be multiple "candidate" foreign keys, 
and we can't just remove clauses matching at least one of them. Imagine 
e.g. a (somewhat artificial) example:
    CREATE TABLE parent (       a INT,       b INT,       c INT,       d INT,       PRIMARY KEY (a,b),       UNIQUE
(c,d)   );
 
    CREATE TABLE child (       a INT,       b INT,       c INT,       d INT,       FOREIGN KEY (a,b) REFERENCES parent
(a,b),      FOREIGN KEY (c,d) REFERENCES parent (c,b)    );
 

and a join on (a,b,c,d). We can't just discard all the conditions, 
because (a,b) and (c,d) may 'mismatch'. We know (a,b) and (c,d) matches 
about 1/N of the 'parent' table, but we don't know selectivity for the 
whole join condition.

And it may be more complex, e.g. there may be two overlapping FKeys, 
e.b. (a,b) and (b,c) - how do you split that?

But this may be an overkill (multiple multi-column FK join), although if 
we could handle that reasonably, then why not ... someone out there 
certainly has schema like that ;-)

What I think is somewhat more realistic is conditions matching only a 
subset of FK columns - for example with a FK on (a,b) the join may only 
use (a), for example. Then again - we can't just discard that, we need 
to estimate that (and use it to compute the actual selectivity).

I agree that if we want to do anything fancy with this, we will have to 
teach clauselist_selectivity() about foreign keys.

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Run pgindent now?
Next
From: Tom Lane
Date:
Subject: Re: Run pgindent now?