Re: Allowing join removals for more join types - Mailing list pgsql-hackers

From David Rowley
Subject Re: Allowing join removals for more join types
Date
Msg-id CAHoyFK9LBdr9Oa1F+urJDU8aLrWVX8zQQJxWw8zoDWvBMA0V4w@mail.gmail.com
Whole thread Raw
In response to Re: Allowing join removals for more join types  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Allowing join removals for more join types  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 23 June 2014 09:31, Simon Riggs <simon@2ndquadrant.com> wrote:
On 22 June 2014 12:51, Simon Riggs <simon@2ndquadrant.com> wrote:

> Looks good on initial look.

Tests 2 and 3 seem to test the same thing.


Ok, I've removed test 2 and kept test 3 which is the DISTINCT a+b test.
 
There are no tests which have multiple column clauselist/sortlists,
nor tests for cases where the clauselist is a superset of the
sortlist.


I've added:
SELECT a.id FROM a LEFT JOIN (SELECT b.id,b.c_id FROM b GROUP BY b.id,b.c_id) b ON a.b_id = b.id AND a.id = b.c_id
but I'm half temped to just add 2 new tables that allow this to be done in a more sensible way, since c_id is really intended to reference c.id in the defined tables.

I've also added one where the join condition is a superset of the GROUP BY clause. I had indented the one with the constant to be this, but probably, you're right, it should be an actual column since constants are treated differently.
 
Test comments should refer to "join removal" rather than
"optimization" because we may forget which optimization they are there
to test.


Good idea...Fixed.
 
It's not clear to me where you get the term "sortclause" from. This is
either the groupclause or distinctclause, but in the test cases you
provide this shows this has nothing at all to do with sorting since
there is neither an order by or a sorted aggregate anywhere near those
queries. Can we think of a better name that won't confuse us in the
future?


I probably got the word "sort" from the function targetIsInSortList, which expects a list of SortGroupClause. I've renamed the function to sortlist_is_unique_on_restrictinfo() and renamed the sortclause parameter to sortlist. Hopefully will reduce confusion about it being an ORDER BY clause a bit more. I think sortgroupclauselist might be just a bit too long. What do you think?
 
The comment "Since a constant only has 1 value the existence of one here will
+ * not cause any duplication of the results. We'll simply ignore it!"
would be better as "We can ignore constants since they have only one
value and don't affect uniqueness of results".


Ok, changed.
 
The comment "XXX is this comment still true??" can be removed since
its just a discussion point.

 
Removed.
 
The comment beginning "Currently, we only know how to remove left..."
has rewritten a whole block of text just to add a few words in the
middle. We should rewrite the comment so it changes as few lines as
possible. Especially when that comment is going to be changed again
with your later patches. Better to have it provide a bullet point list
of things we know how to remove, so we can just add to it later.


I had thought that I'd put the code for other join types in their own functions as not all will have a SpecialJoinInfo. In the patch that contained ANTI and SEMI join support I had renamed the function join_is_removable() to leftjoin_is_removable() and added a new function for semi/anti joins.

I've done a re-factor of this comment, although it likely would still need some small updates around the part where it talks about "left join" later when I start working on support for other join types. The previous comment was giving some clarification about returning early when there's no indexes on the relation, I decided to move this out of that comment and just include a more general note at the bottom but also add some more detail about why we're fast pathing out when indexlist is empty.
 
Still looks good, other than the above.


Great. Thanks for reviewing!

I've attached an updated patch and also a delta patch of the changes I've made since the last version.

Regards

David Rowley
 

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: pgaudit - an auditing extension for PostgreSQL
Next
From: Pavel Stehule
Date:
Subject: Re: [Fwd: Re: proposal: new long psql parameter --on-error-stop]