Re: [PERFORM] Outer joins and equivalence - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: [PERFORM] Outer joins and equivalence
Date
Msg-id 1212643067.4148.244.camel@ebony.site
Whole thread Raw
In response to Re: [PERFORM] Outer joins and equivalence  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, 2008-06-04 at 22:18 -0400, Tom Lane wrote:
> [ redirecting thread from -performance to -hackers ]
> 
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I've got a test case which shows something related and weird, though not
> > the exact case.
> 
> > The queries shown here have significantly different costs, depending
> > upon whether we use tables a or b in the query. Since a and b are
> > equivalent this result isn't expected at all.
> 
> Hmm.  I had been guessing that there was something about your original
> query that prevented the system from applying best_appendrel_indexscan,
> but after fooling with this a bit, I don't believe that's the issue
> at all.  The problem is that these two cases "should" be equivalent:
> 
>   select ... from a join b on (a.id = b.id) left join c on (a.id = c.id);
> 
>   select ... from a join b on (a.id = b.id) left join c on (b.id = c.id);
> 
> but they are not seen that way by the current planner.  It correctly
> forms an EquivalenceClass consisting of a.id and b.id, but it cannot put
> c.id into that same class, and so the clause a.id = c.id is just left
> alone; there is noplace that can generate "b.id = c.id" as an
> alternative join condition.  This means that (for the first query)
> we can consider the join orders "(a join b) leftjoin c" and
> "(a leftjoin c) join b", but there is no way to consider the join
> order "(b leftjoin c) join a"; to implement that we'd need to have the
> alternative join clause available.  So if that join order is
> significantly better than the other two, we lose.
> 
> This is going to take a bit of work to fix :-(.  I am toying with the
> idea that we could go ahead and put c.id into the EquivalenceClass
> as a sort of second-class citizen that's labeled as associated with this
> particular outer join --- the implication being that we can execute the
> outer join using a generated clause that equates c.id to any one of the
> first-class members of the EquivalenceClass, but above the outer join
> we can't assume that c.id hasn't gone to null, so it's not really equal
> to anything else in the class.  I think it might also be possible
> to get rid of the reconsider_outer_join_clauses() kluge in favor of
> driving transitive-equality-to-a-constant off of this representation.

Yes, EquivalenceClass allows an implication to be made either way
around, which is wrong for this class of problem. I was imagining a
higher level ImplicationClass that was only of the form A => B but not B
=> A. So we end up with an ImplicationTree, rather than a just a flat
Class. Which is where I punted...

> However there's a larger issue here, which is the very identity of an
> outer join :-(.  Currently, for the first query above, the left join
> is defined as being between a and c, with a being the minimum
> left-hand-side needed to form the join.  To be able to handle a case
> like this, it seems that the notion of a "minimum left hand side"
> falls apart altogether.  We can execute the OJ using either a or b
> as left hand side.  So the current representation of OuterJoinInfo,
> and the code that uses it to enforce valid join orders, needs a serious
> rethink.

Hadn't seen that at all.

> Looks like 8.4 development material to me, rather than something we
> can hope to back-patch a fix for...

Definitely.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: "Koichi Suzuki"
Date:
Subject: Re: Core team statement on replication in PostgreSQL
Next
From: Mayuresh Nirhali
Date:
Subject: orafce does NOT build with Sun Studio compiler