Thread: inner join removal

inner join removal

From
Robert Haas
Date:
So I sat down to work on some code to do inner join removal, and
quickly got stuck.  As a first step, I wanted to write some code that
would identify inner joins that could be treated as LEFT JOINs if we
so chose.  Consider:

SELECT 1 FROM foo, bar WHERE foo.x = bar.x;

If it so happens that there is a left join constraint from either foo
(x) or bar (x) referencing the other one, then the referenced column
has a unique index on it; if, further, the referencing column is not
null, then every row in the referencing table will have exactly one
partner in the referenced table.  Thus, the join can be implemented as
a LEFT JOIN without changing the results.  In this case, that also
means the join can be removed, but it can be a win anyway.  Consider:

SELECT * FROM foo LEFT JOIN (bar JOIN baz ON bar.y = baz.y) ON foo.x = bar.x;

If foo is itty bitty and bar and baz are enormous, it would be nice to
start by joining foo to bar and then joining the result to baz, but
that's not legal.  However, if bar (y) references baz (y) and bar.y is
not null, then the inner join is equivalent to a left join and it's OK
to commute them.  Anyway, I didn't get near as far as actually trying
to implement this fancy footwork because I got stuck on a much simpler
problem.  I thought that it would make sense to iterate through all
the relations in the query and look at the join clauses for each one.
So in the above query, for example, I was expecting that when I looked
at the RelOptInfo for baz, its joinlist would contain bar.y = baz.y.
It doesn't, because bar.y = baz.y gets eaten by the equivalence class
machinery.

Essentially, what I want to do is ask, for each relation, whether
there is exactly one other relation that gets joined to it, and then
if that proves to be the case, get a list of all the join clauses and
examine them.  But I can't figure out how to do that.

Help?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: inner join removal

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Consider:

> SELECT * FROM foo LEFT JOIN (bar JOIN baz ON bar.y = baz.y) ON foo.x = bar.x;

> If foo is itty bitty and bar and baz are enormous, it would be nice to
> start by joining foo to bar and then joining the result to baz, but
> that's not legal.  However, if bar (y) references baz (y) and bar.y is
> not null, then the inner join is equivalent to a left join and it's OK
> to commute them.

I think you're going at this in the wrong place.  It'd likely work
better to identify this situation while building the SpecialJoinInfo
structs describing the join order constraints, and mark the constraints
appropriately.  In fact, I'm not convinced that "convert the inner join
to a left join" is even the right way to think about the problem,
because if you fail to get a win from it then you have likely made
things worse not better, by adding a join order constraint that wasn't
there before.  I think it might work out better if you ask "what
additional conditions are needed in order to prove that this inner join
can commute with this left join", and then work at being able to prove
that.  (It's entirely likely that the planner isn't currently gathering
the right information for solving that problem.)
        regards, tom lane


Re: inner join removal

From
Robert Haas
Date:
On Thu, Jul 8, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Consider:
>
>> SELECT * FROM foo LEFT JOIN (bar JOIN baz ON bar.y = baz.y) ON foo.x = bar.x;
>
>> If foo is itty bitty and bar and baz are enormous, it would be nice to
>> start by joining foo to bar and then joining the result to baz, but
>> that's not legal.  However, if bar (y) references baz (y) and bar.y is
>> not null, then the inner join is equivalent to a left join and it's OK
>> to commute them.
>
> I think you're going at this in the wrong place.  It'd likely work
> better to identify this situation while building the SpecialJoinInfo
> structs describing the join order constraints, and mark the constraints
> appropriately.

I'll take a look at that.

> In fact, I'm not convinced that "convert the inner join
> to a left join" is even the right way to think about the problem,
> because if you fail to get a win from it then you have likely made
> things worse not better, by adding a join order constraint that wasn't
> there before.

Yeah, I'm aware of that problem, although I haven't figured out
exactly what to do about it.  I do realize we can't afford lossage in
that situation.  There are actually possible wins from transforming an
inner join into a left join OR a left join into an inner join, so it's
obviously not right to transform blindly.

> I think it might work out better if you ask "what
> additional conditions are needed in order to prove that this inner join
> can commute with this left join", and then work at being able to prove
> that.  (It's entirely likely that the planner isn't currently gathering
> the right information for solving that problem.)

We have to avoid putting much of anything into the critical path where
we're trying out different join orders - we want to figure it out
earlier and, if possible, by examining each relation just once.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: inner join removal

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jul 8, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think it might work out better if you ask "what
>> additional conditions are needed in order to prove that this inner join
>> can commute with this left join", and then work at being able to prove
>> that. �(It's entirely likely that the planner isn't currently gathering
>> the right information for solving that problem.)

> We have to avoid putting much of anything into the critical path where
> we're trying out different join orders - we want to figure it out
> earlier and, if possible, by examining each relation just once.

Right, but look for example at the logic involved with the current outer
join transformation identity #3, which can't be applied unless the join
predicate is strict for the left-hand side.  We avoid doing the dogwork
to test that more than once.  I'm hoping that something similar will
work for this problem.  But it's way premature to say much about that
without a mathematical characterization of the extra conditions needed
to make the joins commutable.
        regards, tom lane


Re: inner join removal

From
Josh Berkus
Date:
I think it might work out better if you ask "what
> additional conditions are needed in order to prove that this inner join
> can commute with this left join", and then work at being able to prove
> that.  (It's entirely likely that the planner isn't currently gathering
> the right information for solving that problem.)

I also see this falling into a whole class of issues where the planner
could utilize the existance of FKs to plan queries better.  That might
be a better first step.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


Re: inner join removal

From
Robert Haas
Date:
On Thu, Jul 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Jul 8, 2010 at 2:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think it might work out better if you ask "what
>>> additional conditions are needed in order to prove that this inner join
>>> can commute with this left join", and then work at being able to prove
>>> that.  (It's entirely likely that the planner isn't currently gathering
>>> the right information for solving that problem.)
>
>> We have to avoid putting much of anything into the critical path where
>> we're trying out different join orders - we want to figure it out
>> earlier and, if possible, by examining each relation just once.
>
> Right, but look for example at the logic involved with the current outer
> join transformation identity #3, which can't be applied unless the join
> predicate is strict for the left-hand side.  We avoid doing the dogwork
> to test that more than once.

<admits ignorance>

Uh... where exactly is that logic?  I've been looking for it for 2
years and haven't found it yet.

> I'm hoping that something similar will
> work for this problem.  But it's way premature to say much about that
> without a mathematical characterization of the extra conditions needed
> to make the joins commutable.

I wrote a previous email on this topic which might be a start in that direction.

http://archives.postgresql.org/pgsql-hackers/2009-10/msg01012.php

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: inner join removal

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jul 8, 2010 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Right, but look for example at the logic involved with the current outer
>> join transformation identity #3, which can't be applied unless the join
>> predicate is strict for the left-hand side. �We avoid doing the dogwork
>> to test that more than once.

> Uh... where exactly is that logic?  I've been looking for it for 2
> years and haven't found it yet.

Look at the code that deals with SpecialJoinInfo.lhs_strict.  There
isn't very much of it (which is precisely my point ...)
        regards, tom lane