Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject Re: Removing unneeded self joins
Date
Msg-id 89c8a17d-01f8-a03d-7fb8-0a8a070be47c@postgrespro.ru
Whole thread Raw
In response to Re: Removing unneeded self joins  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Removing unneeded self joins
List pgsql-hackers
On 3/25/19 07:07, David Rowley wrote:
> You had commented the test with:
>
> -- If index conditions are different for each side, we won't select the same
> -- row on both sides, so the join can't be removed.
>
> but I don't quite understand why we can't remove the join in this
> situation.


My rationale was that we're not exactly removing the join, but replacing 
it with a scan. So it is not enough to just have a single row on each 
side, it must be the same physical row. In this example, before the 
transformation, t1.b is not equal to t2.b, but they become equal 
afterwards. This looks somewhat wrong. On the other hand, if the 
conditions are different, the resulting condition is going to evaluate 
to constant false and we won't get any rows, so maybe it's OK.

This brings me again to the question of what are the conditions for join 
removal. If the formulation with indexes is not general enough, what do 
we use instead? I guess it could be something like this:

1. Given the (btree equality) join and restriction clauses, both sides 
are unique on the same set of columns. That is, if we fix the values of 
these columns, both sides have at most one matching row.

     a. For each of these columns, we have either

         i) a join clause that equates some expression referencing the 
outer column to the same expression referencing the same inner column.

         ii) a clause for each relation that equates the same expression 
referencing the outer and inner column to some other arbitrary 
expression, possibly a different one for each side. This expression may 
be a Const or some expression that references a Var of some third relation.

2. All the resulting columns can be calculated using either side of the 
join. For now, just require that both sides are base relations that 
refer to the same physical relation.

Two points are not clear to me here:

1. We don't handle join clauses to third relations, but can they be 
treated the same way we treat Consts?

2. Can we simplify the join when we don't have any join clauses and only 
have Consts? Or should we have at least one join clause that equates the 
same inner and outer column? Why is one join clause enough?

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Protect syscache from bloating with negative cache entries
Next
From: Tom Lane
Date:
Subject: Re: Feature: Add Greek language fulltext search