Re: Patch to support SEMI and ANTI join removal - Mailing list pgsql-hackers

From David Rowley
Subject Re: Patch to support SEMI and ANTI join removal
Date
Msg-id CAApHDvo6XmnevLOA2BymDrRKwYsAnrZXWn_ERZ-nB_kFViGMQw@mail.gmail.com
Whole thread Raw
In response to Re: Patch to support SEMI and ANTI join removal  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Patch to support SEMI and ANTI join removal
List pgsql-hackers
On Wed, Oct 1, 2014 at 12:01 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-09-30 23:25:45 +1300, David Rowley wrote:
>
> I've not quite gotten my head around how we might stop the unneeded
> relation from being the primary path to join the other inner relations,
> i.e. what would stop the planner making a plan that hashed all the other
> relations and planned to perform a sequence scan on the relation that we
> have no need to scan (because the foreign key tells us the join is
> pointless). If we were not use use that relation then we'd just have a
> bunch of hash tables with no path to join them up. If we did anything to
> force the planner into creating a plan that would suit skipping relations,
> then we could possibly be throwing away the optimal plan..... Right?

I'm not 100% sure I understand your problem description, but let me
describe how I think this would work. During planning, you'd emit the
exactly same plan as you'd today, with two exceptions:
a) When costing a node where one side of a join is very likely to be
   removable, you'd cost it nearly as if there wasn't a join.

Ok given the tables:
create table t1 (x int primary key);
create table t2 (y int primary key);

suppose the planner came up with something like:

test=# explain (costs off) select t2.* from t1 inner join t2 on t1.x=t2.y;
         QUERY PLAN
----------------------------
 Hash Join
   Hash Cond: (t1.x = t2.y)
   ->  Seq Scan on t1
   ->  Hash
         ->  Seq Scan on t2

If we had a foreign key...

alter table t2 add constraint t2_y_fkey foreign key (y) references t1 (x);

...the join to t1 could possibly be "ignored" by the executor... but there's a problem as the plan states we're going to seqscan then hash that relation, then seqscan t1 with a hash lookup on each of t1's rows. In this case how would the executor skip the scan on t1? I can see how this might work if it was t2 that we were removing, as we'd just skip the hash lookup part in the hash join node.

 
b) The planner would attach some sort of 'one time join qual' to the
   'likely removable' join nodes. If, during executor init, that qual
   returns false, simply don't perform the join. Just check the inner
   relation, but entirely skip the outer relation.


I think in the example that I've given above that the relation we'd want to keep is the outer relation, we'd want to throw away the inner one, but I can't quite see how that would work.

Hopefully that makes sense.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: open items for 9.4
Next
From: Andres Freund
Date:
Subject: Re: Patch to support SEMI and ANTI join removal