Thread: foreign-key inference & join removal
On Thu, Sep 17, 2009 at 5:14 PM, I wrote: > I think that the next project in this area should > be to try to support removal of INNER joins. (Removal of SEMI, ANTI, > and FULL joins seems unlikely ever to be interesting.) That will > require teaching the planner about foreign key constraints, which > interestingly opens up some other interesting optimization > opportunities. An INNER join that can never match zero rows can > alternatively be implemented as a LEFT join, which can be reordered in > some situations where an inner join can't. e.g. A LJ (B IJ C ON Pbc) > ON Pab can be implemented as (A LJ B ON Pab) LJ/IJ C ON Pbc if it is > the case that for every row in B, there will be at least one row in C > such that Pbc holds. A few more thoughts on this topic. Suppose we define a new join type called "inner_or_left_join". This means that we've proven that every outer row has at least one join partner, so that we'll get the same results whichever way we implement it. We can prove this for either inner joins or left joins, or for right joins with the sides reversed, by checking that: (1) The inner rel is a baserel with no restriction clauses. (2) All the join clauses are merge-joinable. (3) There is a table on the outer side of the join with a foreign key constraint referencing the inner table, such that the columns of the foreign key constraint and the chosen equality operators exactly match up with the join clauses (no extra columns, no extra join clauses). (4) All the relevant columns of the outer table are NOT NULL. I am not quite sure how to check for #3 and #4, nor am I sure where the best place to put this logic is. Once this infrastructure is in place, it opens up a number of interesting optimization opportunities. 1. Inner joins that are relabelled as inner_or_left joins can be removed using the same logic that we already use for left joins. 2. Inner_or_left joins can be reordered more flexibly than either inner joins or left joins. Consider: A LJ (B IJ C ON Pbc) ON Pab If B IJ C is inner_or_left and Pbc is strict, the joins commute. Further, I believe that if the LJ is inner_or_left and Pab is strict, we can commute the joins provided that, in the process, we change the join between A and B to be an inner join (no longer inner_or_left). But if neither join is inner_or_left then no reordering is possible. 3. Some path types are implemented only for inner joins but not for left joins; these can also be used for inner_or_left joins. For example, consider A LJ B ON Pab. If the join can be proven inner_or_left, we have the option of treating it as an inner join and using A rather than B as the inner rel. Thoughts? ...Robert
On Oct 19, 2009, at 03:44 , Robert Haas wrote: > Suppose we define a new join type called "inner_or_left_join". This > means that we've proven that every outer row has at least one join > partner, so that we'll get the same results whichever way we implement > it. We can prove this for either inner joins or left joins, or for > right joins with the sides reversed, by checking that: > > (1) The inner rel is a baserel with no restriction clauses. > (2) All the join clauses are merge-joinable. > (3) There is a table on the outer side of the join with a foreign key > constraint referencing the inner table, such that the columns of the > foreign key constraint and the chosen equality operators exactly match > up with the join clauses (no extra columns, no extra join clauses). > (4) All the relevant columns of the outer table are NOT NULL. While considering this, have you given any thought to the points in http://archives.postgresql.org/pgsql-hackers/2009-07/msg01555.php ? (In short, there are other properties --- e.g. that there is *exactly* one row in B for each in A, uniqueness is kept, etc --- you can deduce from foreign key relationships, which is useful for more than join ordering. The example I gave involved removing Distinct and pushing Limit through a join) -- Alex Brasetvik
On Mon, Oct 19, 2009 at 8:54 AM, Alex Brasetvik <alex@brasetvik.com> wrote: > > On Oct 19, 2009, at 03:44 , Robert Haas wrote: > >> Suppose we define a new join type called "inner_or_left_join". This >> means that we've proven that every outer row has at least one join >> partner, so that we'll get the same results whichever way we implement >> it. We can prove this for either inner joins or left joins, or for >> right joins with the sides reversed, by checking that: >> >> (1) The inner rel is a baserel with no restriction clauses. >> (2) All the join clauses are merge-joinable. >> (3) There is a table on the outer side of the join with a foreign key >> constraint referencing the inner table, such that the columns of the >> foreign key constraint and the chosen equality operators exactly match >> up with the join clauses (no extra columns, no extra join clauses). >> (4) All the relevant columns of the outer table are NOT NULL. > > While considering this, have you given any thought to the points in > http://archives.postgresql.org/pgsql-hackers/2009-07/msg01555.php ? > > (In short, there are other properties --- e.g. that there is *exactly* one > row in B for each in A, uniqueness is kept, etc --- you can deduce from > foreign key relationships, which is useful for more than join ordering. The > example I gave involved removing Distinct and pushing Limit through a join) It's in the back of my mind, but I think join removal & join reordering are the biggest wins here. Pushing a LIMIT through a join doesn't really help by itself because, under the pull model PostgreSQL uses, the lower nodes will only be evaluated to the extent necessary to satisfy the LIMIT. Getting rid of DISTINCT ON could be very useful, but I think it's probably something of a corner case, since normally you won't bother to include DISTINCT ON in the first place if it's not doing anything. ...Robert