Home > mailing lists

Re: Patch to support SEMI and ANTI join removal - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Patch to support SEMI and ANTI join removal
Date	November 23, 2014 14:23:56
Msg-id	CAApHDvoeC8YGWoahVSri-84eN2k0TnH6GPXp1K59y9juC1WWBg@mail.gmail.com Whole thread Raw
In response to	Re: Patch to support SEMI and ANTI join removal (David Rowley <dgrowleyml@gmail.com>)
Responses	Re: Patch to support SEMI and ANTI join removal (Michael Paquier <michael.paquier@gmail.com>)
List	pgsql-hackers

Tree view

On Wed, Nov 19, 2014 at 11:49 PM, David Rowley <dgrowleyml@gmail.com> wrote:

On Sun, Nov 16, 2014 at 12:19 PM, David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

I propose that we keep track of whether there are any potentially
skippable joins at the top of the plan. When we begin execution we do
a single if test to see if there is run-time work to do. If we pass
the run-time tests we then descend the tree and prune the plan to
completely remove unnecessary nodes. We end with an EXPLAIN and
EXPLAIN ANALYZE that looks like this

> QUERY PLAN
> ------------------------------------------------------------------
> Aggregate (actual rows=1 loops=1)
> -> Seq Scan on t1 (actual rows=1000000 loops=1)

Doing that removes all the overheads and complexity; it also matches
how join removal currently works.

I've attached an updated patch which works in this way. All of the skipping code that I had added to the executor's join functions has now been removed.

Here's an example output with the plan trimmed, and then untrimmed.

set constraints b_c_id_fkey deferred;

explain (costs off) select b.* from b inner join c on b.c_id = c.id;

QUERY PLAN

---------------

Seq Scan on b

(1 row)

-- add a item to the trigger queue by updating a referenced record.

update c set id = 2 where id=1;

explain (costs off) select b.* from b inner join c on b.c_id = c.id;

QUERY PLAN

------------------------------

Hash Join

Hash Cond: (b.c_id = c.id)

-> Seq Scan on b

-> Hash

-> Seq Scan on c

(5 rows)

A slight quirk with the patch as it stands is that I'm unconditionally NOT removing Sort nodes that sit below a MergeJoin node. The reason for this is that I've not quite figured out a way to determine if the Sort order is required still.

An example of this can be seen in the regression tests:

-- check merge join nodes are removed properly

set enable_hashjoin = off;

-- this should remove joins to b and c.

explain (costs off)

select COUNT(*) from a inner join b on a.b_id = b.id left join c on a.id = c.id;

QUERY PLAN

---------------------------

Aggregate

-> Sort

Sort Key: a.b_id

-> Seq Scan on a

(4 rows)

As the patch stands there's still a couple of FIXMEs in there, so there's still a bit of work to do yet.

Comments are welcome

Regards

David Rowley

Attachment

inner_join_removals_2014-11-24_7cde1e4.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 23 November 2014, 09:45:10
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Vladimir Koković
Date: 23 November 2014, 21:38:13
Subject: make check-world regress failed

Re: Patch to support SEMI and ANTI join removal - Mailing list pgsql-hackers

Attachment

Previous

Next