Re: Allowing join removals for more join types - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing join removals for more join types
Date
Msg-id CAHoyFK_ef80mS2Jgk=GkoJNVoux3OwrHNdcBWw5VfG5fak=Njw@mail.gmail.com
Whole thread Raw
In response to Re: Allowing join removals for more join types  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Allowing join removals for more join types  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 9 July 2014 09:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm a bit skeptical as to whether testing for that case is actually worth
>> any extra complexity.  Do you have a compelling use-case?  But anyway,
>> if we do want to allow it, why does it take any more than adding a check
>> for Consts to the loops in query_is_distinct_for?  It's the targetlist
>> entries where we'd want to allow Consts, not the join conditions.

> I don't really have a compelling use-case, but you're right, it's just a
> Const check in query_is_distinct_for(), it seems simple enough so I've
> included that in my refactor of the patch to use query_is_distinct_for().
> This allows the regression tests all to pass again.

Meh.  "I wrote a regression test that expects it" is a pretty poor
rationale for adding logic.  If you can't point to a real-world case
where this is important, I'm inclined to take it out.


Ok, I'll pull that logic back out when I get home tonight. 
 
If we were actually serious about exploiting such cases, looking for
bare Consts would be a poor implementation anyhow, not least because
const-folding has not yet been applied to the sub-select.  I think we'd
want to do it for any pseudoconstant expression (no Vars, no volatile
functions); which is a substantially more expensive test.

> 1. The fast path code that exited in join_is_removable() for subquery's
> when the subquery had no group or distinct clause is now gone. I wasn't too
> sure that I wanted to assume too much about what query_is_distinct_for may
> do in the future and I thought if I included some logic in
> join_is_removable() to fast path, that one day it may fast path wrongly.

Or put a quick-check subroutine next to query_is_distinct_for().  The code
we're skipping here is not so cheap that I want to blow off skipping it.

Ok, good idea. I'll craft something up tonight along those lines.
 
On review it looks like analyzejoins.c would possibly benefit from an
earlier fast-path check as well.


Do you mean for non-subqueries? There already is a check to see if the relation has no indexes.
 
> 2. The patch I submitted here
> http://www.postgresql.org/message-id/CAApHDvrfVkH0P3FAooGcckBy7feCJ9QFanKLkX7MWsBcxY2Vcg@mail.gmail.com
> if that gets accepted then it makes the check for set returning functions
> in join_is_removable void.

Right (and done, if you didn't notice already).


Thanks, I noticed that this morning. I'll remove the (now) duplicate check from the patch
 
TBH I find the checks for FOR UPDATE and volatile functions to be
questionable as well.  We have never considered those things to prevent
pushdown of quals into a subquery (cf subquery_is_pushdown_safe).  I think
what we're talking about here is pretty much equivalent to pushing an
always-false qual into the subquery; if people haven't complained about
that, why should they complain about this?  Or to put it in slightly more
principled terms, we've attempted to prevent subquery optimization from
causing volatile expressions to be evaluated *more* times than the naive
reading of the query would suggest, but we have generally not felt that
we needed to prevent them from happening *fewer* times.


Well, that's a real tough one for me as I only added that based on what you told me here:

On 20 May 2014 23:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I doubt you should drop a subquery containing FOR UPDATE, either.
That's a side effect, just as much as a volatile function would be.

                        regards, tom lane


As far as I know the FOR UPDATE check is pretty much void as of now anyway, since the current state of query_is_distinct_for() demands that there's either a DISTINCT, GROUP BY or just a plain old aggregate without any grouping, which will just return a single row, neither of these will allow FOR UPDATE anyway. I really just added the check just to protect the code from possible future additions to query_is_distinct_for() which may add logic to determine uniqueness by some other means.

So the effort here should be probably be more focused on if we should allow the join removal when the subquery contains volatile functions. We should probably think fairly hard on this now as I'm still planning on working on INNER JOIN removals at some point and I'm thinking we should likely be consistent between the 2 types of join for when it comes to FOR UPDATE and volatile functions, and I'm thinking right now that for INNER JOINs that, since they're INNER that we could remove either side of the join. In that case maybe it would be harder for the user to understand why their volatile function didn't get executed.

Saying that... off the top of my head I can't remember what we'd do in a case like:

create view v_a as select a,volatilefunc(a) AS funcresult from a;

select a from v_a;

Since we didn't select funcresult, do we execute the function?

Perhaps we should base this on whatever that does?

I can't give much more input on that right now. I'll have a look at the docs later to see if when mention anything about any guarantees about calling volatile functions.

Regards

David Rowley

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [bug fix or improvement?] Correctly place DLLs for ECPG apps in bin folder
Next
From: Tom Lane
Date:
Subject: Re: Allowing join removals for more join types