Re: Patch to support SEMI and ANTI join removal - Mailing list pgsql-hackers

From David Rowley
Subject Re: Patch to support SEMI and ANTI join removal
Date
Msg-id CAApHDvpMfwStA5K4im4bt7p+49d7y3Ji0o3jvZ8gMY85xzrP2Q@mail.gmail.com
Whole thread Raw
In response to Re: Patch to support SEMI and ANTI join removal  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Wed, Aug 27, 2014 at 1:40 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 08/26/2014 03:28 PM, David Rowley wrote:
Any ideas or feedback on this would be welcome

Before someone spends time reviewing this patch, are you sure this is worth the effort? It seems like very narrow use case to me. I understand removing LEFT and INNER joins, but the case for SEMI and ANTI joins seems a lot thinner. Unnecessary LEFT and INNER joins can easily creep into a query when views are used, for example, but I can't imagine that happening for a SEMI or ANTI join. Maybe I'm lacking imagination. If someone has run into a query in the wild that would benefit from this, please raise your hand.


I agree that the use case for removals of SEMI and ANTI join are a lot thinner than LEFT and INNER joins. My longer term goal here is to add join removal support for INNER joins. In order to do this I need the foreign key infrastructure which is included in this patch. I held back from just going ahead and writing the INNER JOIN removal patch as I didn't want to waste the extra effort in doing that if someone was to find a show stopper problem with using foreign keys the way I am with this patch. I was kind of hoping someone would be able to look at this patch a bit more and confirm to me that it's safe to do this or not before I go ahead and write the inner join version.
 
If I understood correctly, you're planning to work on INNER join removal too. How much of the code in this patch is also required for INNER join removal, and how much is specific to SEMI and ANTI joins?


Apart from the extra lines of code in remove_useless_joins(), there's 3 functions added here which won't be needed at all for INNER JOINs; semiorantijoin_is_removable(), convert_semijoin_to_isnotnull_quals() and convert_antijoin_to_isnull_quals(). Not including the regression tests, this is 396 lines with comments and 220 lines without. All of these functions are static and in analyzejoin.c.
 
The benchmarks I posted a few weeks back show that the overhead of performing the semi/anti join removal checks is quite low. I measured an extra 400 or so nanoseconds for a successful removal on my i5 laptop. Or just 15 nanoseconds on the earliest fast path for a non-removal. This accounted for between 0.008% and 0.2% of planning time for the queries I tested.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: postgresql latency & bgwriter not doing its job
Next
From: Andres Freund
Date:
Subject: Re: postgresql latency & bgwriter not doing its job