Re: PATCH: use foreign keys to improve join estimates v1 - Mailing list pgsql-hackers

From David Rowley
Subject Re: PATCH: use foreign keys to improve join estimates v1
Date
Msg-id CAKJS1f-Ad5Fs9Fsay_=LZT_Ep81r+qU3pVLcecz+a1+BJMVp0g@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: use foreign keys to improve join estimates v1  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: PATCH: use foreign keys to improve join estimates v1  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers



On 26 September 2015 at 01:57, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

On 09/25/2015 03:39 AM, David Rowley wrote:
I looked at this again, and I'm not all that sure it's possible to
assume that 1.0 / <tuples> is valid when there's more than one
relation at either side of the join.
>
My reasoning for this is that the whole basis for the patch is that a
if we find a foreign key match, then we can be sure enough, as far as
row estimations go, that exactly 1 tuple will exist matching that
condition. This assumption of the 1 tuple no longer holds when 2
relations have already been joined, as this can either cause tuple
duplication, or elimination.

I don't see why that would be the case. Of course, you need to take the right <tuples>, i.e. the "target" of the foreign key (the table with UNIQUE constraint) so that the selectivity matches the fact that exactly 1 tuple (on the PK side) matches.

hmm, ok. You're right. It appears I was a bit confused, but thanks for explaining it again. I get it now.

I've been working on this again. I've put back the code that you wrote for the looping over each combination of relations from either side of the join.

I've also added some code to get around the problem with eclass joins and the RestrictInfo having some alternative Vars that don't belong to the foreign key. Basically I'm just checking if the RestrictInfo has a parent_ec, and if it does just loop over the members to try and find the Vars that belong to the foreign key. I've tested it with the following, and it seems to work:

create table a as select i as a_id1, i as a_id2, i as dummy1 from generate_series(0,999) s(i);
alter table a add unique (a_id1, a_id2);
create table b as select i as b_id1, i as b_id2 from generate_series(0,332) s(i);

analyze a;
analyze b;

alter table b add foreign key (b_id1, b_id2) references a (a_id1, a_id2);

explain analyze select * from a inner join b on a.dummy1 = b.b_id1 and a.a_id2 = b.b_id2 where a.a_id1 = a.dummy1;

                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Hash Join  (cost=18.57..26.41 rows=2 width=20) (actual time=0.775..1.046 rows=333 loops=1)
   Hash Cond: ((b.b_id1 = a.dummy1) AND (b.b_id2 = a.a_id2))
   ->  Seq Scan on b  (cost=0.00..5.33 rows=333 width=8) (actual time=0.013..0.046 rows=333 loops=1)
   ->  Hash  (cost=18.50..18.50 rows=5 width=12) (actual time=0.737..0.737 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 51kB
         ->  Seq Scan on a  (cost=0.00..18.50 rows=5 width=12) (actual time=0.014..0.389 rows=1000 loops=1)
               Filter: (dummy1 = a_id1)

The non-patched version estimates 1 row. The patched estimates 2 rows, but that's due to the bad estimate on dummy1 = a_id1.

The 2 comes from ceil(5 * 0.333).

Perhaps you have a better test case to for this?

Regards

David Rowley

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Filip Rembiałkowski
Date:
Subject: pg_dump LOCK TABLE ONLY question
Next
From: Tom Lane
Date:
Subject: Re: pg_dump LOCK TABLE ONLY question