Removing unneeded self joins - Mailing list pgsql-hackers

From Alexander Kuzmenkov
Subject Removing unneeded self joins
Date
Msg-id 64486b0b-0404-e39e-322d-0801154901f3@postgrespro.ru
Whole thread Raw
Responses Re: Removing unneeded self joins  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Removing unneeded self joins  (David Rowley <david.rowley@2ndquadrant.com>)
Re: Removing unneeded self joins  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
Hi hackers,

There is a join optimization we don't do -- removing inner join of a 
table with itself on a unique column. Such joins are generated by 
various ORMs, so from time to time our customers ask us to look into 
this. Most recently, it was discussed on the list in relation to an 
article comparing the optimizations that some DBMS make [1].

I started to explore what can be done about this. Attached is a proof of 
concept patch. It works for some simple cases:

create table tt(a int primary key, b text);
explain select p.* from tt p join (select * from tt where b ~~ 'a%') q 
on p.a = q.a;

                       QUERY PLAN
──────────────────────────────────────────────────────
  Seq Scan on tt p  (cost=0.00..25.88 rows=6 width=36)
    Filter: (b ~~ 'a%'::text)

It also works for semi-joins like `explain select p.* from tt p where 
exists (select * from tt where b ~~ 'a%' and a = p.a);`. This requires a 
preparatory step of reducing unique semi joins to inner joins, and we 
already do this (reduce_unique_semijoin).


What this patch tries to do is to remove these inner joins when a single 
join is being planned (populate_joinrel_with_paths). The main entry 
point is reduce_self_unique_join. First, it proves that both input 
relations are uniquely constrained by the same index given the 
particular join clauses. We already have a way to find such indexes 
(relation_has_unique_index_for), so I was able to reuse this. What I'm 
not sure about is how to properly remove the join after that. For now, I 
just pretend that the join relation being built is the outer baserel, 
add to it the restrictions from the inner relation, and then plan it as 
usual. Maybe there is a less hacky way to do it? I've seen elsewhere a 
suggestion to use an AppendPath for a similar purpose, but here we can't 
just use the outer relation we've already planned because the 
restriction list is different.

I'd be glad to hear your thoughts on this.

[1] 

https://www.postgresql.org/message-id/flat/CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw%40mail.gmail.com#CAMjNa7cC4X9YR-vAJS-jSYCajhRDvJQnN7m2sLH1wLh-_Z2bsw@mail.gmail.com

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Considering signal handling in plpython again
Next
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression