Re: PATCH: use foreign keys to improve join estimates v1 - Mailing list pgsql-hackers

From David Rowley
Subject Re: PATCH: use foreign keys to improve join estimates v1
Date
Msg-id CAKJS1f8ZvzU=D5dCVp7uOgZB9jsfybonF1ad_rGBLzhUtS+yJw@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: use foreign keys to improve join estimates v1  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: use foreign keys to improve join estimates v1  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 24 September 2015 at 23:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

2) find_best_match_foreign_key
------------------------------

I think the comment before the function needs rephrasing (seems a bit broken to me). I do like the approach in general, although it changes the semantics a bit. The original code only considered "fully matching" fkeys, while the new code simply takes the longest match.



Oops, I did not code this at all the way I had originally pictured it. I guess the evidence of that is in the function comment, which I wrote before coding the function.
I of course intended to only allow full matches. 
A full patch which fixes this is attached. This also should fix the clause duplication trick that I talked about.


The comment before the function mentions it's possible to confuse the code with duplicate quals. Can you give an example?


Something like: SELECT * FROM a LEFT JOIN b ON a.id=b.id and a.id=b.id AND a.id=b.id AND a.id2 = b.id2 AND a.id3 = b.id3;

Note a.id = b.id repeated 3 times.

Where a foreign key exists on (a.id) ref (b.id), and also (a.id2, a.id3) ref (b.id2, b.id3). In this case we match 3 quals for the FK with 1 key, and 2 quals with the FK with 2 keys. But this is now fixed in the attached.

I used an outer join here as they won't be transformed into eclass members and back to quals again. INNER JOIN wouldn't allow the duplicates to materialise again.

 

3) clauselist_join_selectivity
------------------------------

I think this is actually broken, i.e. it does not handle the cases it was handling before - namely the cases where more than 3 tables are joined (fkjoin2.sql).

Imagine a "fact" table referencing two dimensions, using 2-column fkeys:

create table a (a1 int, a2 int, unique (a1,a2));
create table b (b1 int, b2 int, unique (b1,b2));
create table f (a1 int, a2 int, b1 int, b2 int);

insert into a select i,i from generate_series(0,999999) s(i);
insert into b select i,i from generate_series(0,999999) s(i);
insert into f select i/10,i/10,i/10,i/10
                from generate_series(0,9999999) s(i);

alter table f add foreign key (a1,a2) references a(a1,a2);
alter table f add foreign key (b1,b2) references b(b1,b2);

Each dimension has 1M rows, fact has 10M rows (and references each row in dimensions 10x).

Now, let's see this simple star join:

explain select * from f join a using (a1,a2)
                        join b using (b1,b2);

This should return all 10M rows, and originally this was estimated like this:

                                 QUERY PLAN
-----------------------------------------------------------------------
Hash Join  (cost=66664.00..573853.57 rows=10000175 width=16)
  Hash Cond: ((f.b1 = b.b1) AND (f.b2 = b.b2))
  ->  Hash Join  (cost=33332.00..363955.16 rows=10000175 width=16)
       Hash Cond: ((f.a1 = a.a1) AND (f.a2 = a.a2))
       ->  Seq Scan on f  (cost=0.00..154056.75 rows=10000175 width=16)
       ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8)
           ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=8)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8)
         ->  Seq Scan on b  (cost=0.00..14425.00 rows=1000000 width=8)

so pretty much perfectly accurate, while the new code does this:

                                 QUERY PLAN
----------------------------------------------------------------------
 Hash Join  (cost=66664.00..573853.57 rows=10 width=16)
   Hash Cond: ((f.b1 = b.b1) AND (f.b2 = b.b2))
   ->  Hash Join  (cost=33332.00..363955.16 rows=10000175 width=16)
       Hash Cond: ((f.a1 = a.a1) AND (f.a2 = a.a2))
        ->  Seq Scan on f  (cost=0.00..154056.75 rows=10000175 width=16)
        ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8)
           ->  Seq Scan on a  (cost=0.00..14425.00 rows=1000000 width=8)
   ->  Hash  (cost=14425.00..14425.00 rows=1000000 width=8)
        ->  Seq Scan on b  (cost=0.00..14425.00 rows=1000000 width=8)

I think this is due to this check in clauselist_join_selectivity:

   if (bms_get_singleton_member(sjinfo->min_righthand, &innerrelid) &&
       bms_get_singleton_member(sjinfo->min_lefthand, &outerrelid))

which pretty much says "only work with joins on two base relations". So as long as you have a joinrel, (e.g. a fact + one of the dimensions), it's game over.

I think the restriction is unnecessary, because when estimating joins, we effectively take cardinality of a cartesian product of all the base relations, and then apply selectivities for all the join quals (in a pairwise manner).

So for the three tables we take

   card(join) = card(f) * card(a) * card(b) * sel(f,a) * sel(f,b)

and we can consider the foreign keys in the pairwise selectivities.


I looked at this again, and I'm not all that sure it's possible to assume that 1.0 / <tuples> is valid when there's more than one relation at either side of the join.
My reasoning for this is that the whole basis for the patch is that a if we find a foreign key match, then we can be sure enough, as far as row estimations go, that exactly 1 tuple will exist matching that condition. This assumption of the 1 tuple no longer holds when 2 relations have already been joined, as this can either cause tuple duplication, or elimination.

It's a little hard force the join order so that it happens this way, but say the join order in the following example was, from left to right:

a CROSS JOIN b JOIN c on a.id=c.id

Of course, the planner would perform the cross join last in reality, but it's possible to trick it too not.

Let's say a foreign key exists on c (id) references a(id). If we assume that a.id = c.id produces 1 tuple in the (a,b) rel, then we'd be very wrong, as it's not 1, it's the number of rows in b! Which could be millions.

I've attempted to create a test case to demo this against your v2 patch, and it does fall for it, only it does happen to actually give better estimates than without the patch, but I'm sure there must be some case to be found where it does not.

create table a (id1 int, id2 int, c_id1 int not null, c_id2 int not null, primary key(id1, id2));
create table b (id int primary key, a_id1 int not null, a_id2 int not null);
create table c (id1 int, id2 int);

alter table b add constraint b_a_id1_a_id2_fkey foreign key (a_id1, a_id2) references a;

insert into a select x,x,x,x from generate_series(1,100) s(x);
insert into b select id1,id2,id1 from a;
insert into c select c_id1,c_id2 from a, generate_series(1,100);

analyze a;
analyze b;
analyze c;

explain analyze select * from a inner join b on a.id1=b.a_id1 and a.id2 = b.a_id2 join c on c.id1%2 = a.c_id1%2 and c.id2%2 = a.c_id2%2;

I'm not saying that my version of the patch does any better... I've actually not tested, but I think we could only use the FK to test this if we happened to back that up with something like UNIQUE join detection. My unique join patch aims to add some of the infrastructure which might be required to make that possible.

Perhaps the patch cannot work well without this, as since we are trying to fix a row underestimation, then we might just succeed in having the planner switch the join order to some order that is not backed up by foreign keys, because the row estimates look more favourable that way. Although perhaps that could be fixed by changing the 1.0 / <tuples> to <something else> / <tuples>

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Kouhei Kaigai
Date:
Subject: CustomScan support on readfuncs.c
Next
From: Thomas Munro
Date:
Subject: Manual bitswizzling -> LOCKBIT_ON