Removing INNER JOINs - Mailing list pgsql-hackers

From David Rowley
Subject Removing INNER JOINs
Date
Msg-id CAApHDvocUEYdt1uT+DLDPs2xEu=v3qJGT6HeXKonQM4rY_OsSA@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi,

Starting a new thread which continues on from

To give a brief summary for any new readers:

The attached patch allows for INNER JOINed relations to be removed from the plan,
providing none of the columns are used for anything, and a foreign key exists which
proves that a record must exist in the table being removed which matches the join
condition:

Example:

test=# create table b (id int primary key);
CREATE TABLE
test=# create table a (id int primary key, b_id int not null references b(id));
CREATE TABLE
test=# explain (costs off) select a.* from a inner join b on a.b_id = b.id;
  QUERY PLAN
---------------
 Seq Scan on a

This has worked for a few years now for LEFT JOINs, so this patch is just extending
the joins types that can be removed.

This optimisation should prove to be quite useful for views in which a subset of its
columns are queried.

The attached is an updated patch which fixes a conflict from a recent commit and
also fixes a bug where join removals did not properly work for PREPAREd statements.

I'm looking for a bit of feedback around the method I'm using to prune the redundant
plan nodes out of the plan tree at executor startup. Particularly around not stripping
the Sort nodes out from below a merge join, even if the sort order is no longer
required due to the merge join node being removed. This potentially could leave
the plan suboptimal when compared to a plan that the planner could generate
when the removed relation was never asked for in the first place.

There are also other cases such as MergeJoins performing btree index scans
in order to obtain ordered results for a MergeJoin that would be better executed
as a SeqScan when the MergeJoin can be removed.

Perhaps some costs could be adjusted at planning time when there's a possibility
that joins could be removed at execution time, although I'm not quite sure about
this as it risks generating a poor plan in the case when the joins cannot be removed.

I currently can't see much of a way around these cases, but in both cases removing
the join should prove to be a win, though just perhaps not with the most optimal of
plans.

There are some more details around the reasons behind doing this weird executor
startup plan pruning around here:


Comments are most welcome

Regards

David Rowley
Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: How about a option to disable autovacuum cancellation on lock conflict?
Next
From: Magnus Hagander
Date:
Subject: Re: How about a option to disable autovacuum cancellation on lock conflict?