Re: New design for FK-based join selectivity estimation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: New design for FK-based join selectivity estimation
Date
Msg-id 20877.1466175216@sss.pgh.pa.us
Whole thread Raw
In response to Re: New design for FK-based join selectivity estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: New design for FK-based join selectivity estimation
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> Thanks for getting the patch into a much better shape. I've quickly 
> reviewed the patch this morning before leaving to the airport - I do 
> plan to do additional review/testing once in the air or perhaps over the 
> weekend. So at the moment I only have a few minor comments:

> 1) Shouldn't we define a new macro in copyfuncs.c, to do the memcpy for 
> us? Seems a bit strange we have macros for everything else.

Perhaps, but it seemed not that compelling since we need bespoke code for
those fields in outfuncs.c etc.  Maybe it would be worth thinking about
macro infrastructure for array-type fields in all of those modules.

> 2) I'm wondering whether removing the restrict infos when
>      if (fkinfo->eclass[i] == rinfo->parent_ec)
> is actually correct. Can't the EC include conditions that do not match 
> the FK?

Doesn't matter.  The point is that it *does* include a condition that
does match the FK, whether it chose to generate exactly that condition
for this join or some related one.

> I mean something like this:
>    CREATE TABLE a (id1 INT PRIMARY KEY, id2 INT);
>    CREATE TABLE b (id1 INT REFERENCES a (id1), id2 INT);
> and then something like
>    SELECT * FROM a JOIN b ON (a.id1 = b.id1 AND a.id1 = b.id2)

Right.  In this case we'll have an EC containing {a.id1, b.id1, b.id2}
which means that equivclass.c will generate a restriction condition
b.id1 = b.id2 to be applied at the scan of b.  At the join level, it
has a choice whether to generate a.id1 = b.id1 or a.id1 = b.id2.
It could generate both, but that would be pointlessly inefficient (and
would likely confuse the selectivity estimators, too).  But even if it
chooses to generate a.id1 = b.id2, we should recognize that the FK is
matched.  What we're effectively doing by dropping that clause in
favor of treating the FK as matched is overridding equivclass.c's
arbitrary choice of which join clause to generate with an equally valid
choice that is easier to estimate for.

> 3) I think this comment in get_foreign_key_join_selectivity is wrong and 
> should instead say 'to FK':
>    /* Otherwise, see if rinfo was previously matched to EC */

Duh, yeah.


I rewrote the comments in this section to clarify a bit:
           /* Drop this clause if it matches any column of the FK */           for (i = 0; i < fkinfo->nkeys; i++)
    {               if (rinfo->parent_ec)               {                   /*                    * EC-derived clauses
canonly match by EC.  It is okay to                    * consider any clause derived from the same EC as
   * matching the FK: even if equivclass.c chose to generate                    * a clause equating some other pair of
Vars,it could                    * have generated one equating the FK's Vars.  So for                    * purposes of
estimation,we can act as though it did so.                    *                    * Note: checking parent_ec is a bit
ofa cheat because                    * there are EC-derived clauses that don't have parent_ec                    * set;
butsuch clauses must compare expressions that                    * aren't just Vars, so they cannot match the FK
anyway.                   */                   if (fkinfo->eclass[i] == rinfo->parent_ec)                   {
           remove_it = true;                       break;                   }               }               else
      {                   /*                    * Otherwise, see if rinfo was previously matched to FK as
    * a "loose" clause.                    */                   if (list_member_ptr(fkinfo->rinfos[i], rinfo))
        {                       remove_it = true;                       break;                   }               }
    }
 

        regards, tom lane



pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Restriction of windows functions
Next
From: Teodor Sigaev
Date:
Subject: Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?