Re: Removing INNER JOINs - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: Removing INNER JOINs
Date
Msg-id 20141202201229.GV3342@tamriel.snowman.net
Whole thread Raw
In response to Re: Removing INNER JOINs  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sun, Nov 30, 2014 at 12:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Bottom line, given all the restrictions on whether the optimization can
> > happen, I have very little enthusiasm for the whole idea.  I do not think
> > the benefit will be big enough to justify the amount of mess this will
> > introduce.
>
> This optimization applies to a tremendous number of real-world cases,
> and we really need to have it.  This was a huge problem for me in my
> previous life as a web developer.  The previous work that we did to
> remove LEFT JOINs was an enormous help, but it's not enough; we need a
> way to remove INNER JOINs as well.

For my 2c, I'm completely with Robert on this one.  There are a lot of
cases this could help with, particularly things coming out of ORMs
(which, yes, might possibly be better written, but that's a different
issue).

> I thought that David's original approach of doing this in the planner
> was a good one.  That fell down because of the possibility that
> apparently-valid referential integrity constraints might not be valid
> at execution time if the triggers were deferred.  But frankly, that
> seems like an awfully nitpicky thing for this to fall down on.  Lots
> of web applications are going to issue only SELECT statements that run
> as as single-statement transactions, and so that issue, so troubling
> in theory, will never occur in practice.  That doesn't mean that we
> don't need to account for it somehow to make the code safe, but any
> argument that it abridges the use case significantly is, in my
> opinion, not credible.

Agreed with this also, deferred triggers are not common-place in my
experience and when it *does* happen, ime at least, it's because you
have a long-running data load or similar where you're not going to
care one bit that large, complicated JOINs aren't as fast as they
might have been otherwise.

> Anyway, David was undeterred by the rejection of that initial approach
> and rearranged everything, based on suggestions from Andres and later
> Simon, into the form it's reached now.  Kudos to him for his
> persistance.  But your point that we might have chosen a whole
> different plan if it had known that this join was cheaper is a good
> one.  However, that takes us right back to square one, which is to do
> this at plan time.  I happen to think that's probably better anyway,
> but I fear we're just going around in circles here.  We can either do
> it at plan time and find some way of handling the fact that there
> might be deferred triggers that haven't fired yet; or we can do it at
> execution time and live with the fact that we might have chosen a plan
> that is not optimal, though still better than executing a
> completely-unnecessary join.

Right, we can't get it wrong in the face of deferred triggers either.
Have we considered only doing the optimization for read-only
transactions?  I'm not thrilled with that, but at least we'd get out
from under this deferred triggers concern.  Another way might be an
option to say "use the optimization, but throw an error if you run
into a deferred trigger", or perhaps save both plans and use whichever
one we can when we get to execution time?  That could make planning
time go up too much to work, but perhaps it's worth testing..
Thanks,    Stephen

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Review of GetUserId() Usage
Next
From: Jeff Janes
Date:
Subject: Re: How about a option to disable autovacuum cancellation on lock conflict?