Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From David Rowley
Subject Re: Performance improvement for joins where outer side is unique
Date
Msg-id CAKJS1f_BxfNe3ixfz4S-g2RTodvjk82edcv9O9WtBqc9eUbWSA@mail.gmail.com
Whole thread Raw
In response to Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Performance improvement for joins where outer side is unique  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Performance improvement for joins where outer side is unique  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 7 April 2016 at 08:01, David Rowley <david.rowley@2ndquadrant.com> wrote:
> On 7 April 2016 at 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Starting to look at this again.  I wonder, now that you have the generic
>> caching mechanism for remembering whether join inner sides have been
>> proven unique, is it still worth having the is_unique_join field in
>> SpecialJoinInfo?  It seems like that's creating a separate code path
>> for special joins vs. inner joins that may not be buying us much.
>> It does potentially save lookups in the unique_rels cache, if you already
>> have the SpecialJoinInfo at hand, but I'm not sure what that's worth.
>
> I quite like that field where it is, as it should make
> remove_useless_joins() a bit more efficient, as after a LEFT JOIN is
> removed, the previous code would go off and try to make sure all the
> joins are unique again, but now we cache that, and save it from having
> to bother doing that again, on joins already marked as unique.
>
> Certainly changing that would mean one less special case in
> joinpath.c, as the JOIN_LEFT case can be handle the same as the other
> cases, although it looks like probably, if I do change that, then I'd
> probably move is_innerrel_unique_for() into analyzejoins.c, and put
> the special case for JOIN_LEFT in that function, so that it calls
> specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in
> the unique_rels cache if the result comes back positive, and in the
> non_unique_rels cache if negative... But it seems a bit crazy to go to
> the trouble or all that caching, when we can just throw the result in
> a struct field in the case of Special Joins.  Maybe we could just hide
> both the new joinpath.c functions in analyzejoins.c and call it quits.
> It's not as if there's no special cases for JOIN_LEFT in that file.
>
>> Also, as I'm looking around at the planner some more, I'm beginning to get
>> uncomfortable with the idea of using JOIN_SEMI this way.  It's fine so far
>> as the executor is concerned, no doubt, but there's a lot of planner
>> expectations about the use of JOIN_SEMI that we'd be violating.  One is
>> that there should be a SpecialJoinInfo for any SEMI join.  Another is that
>> JOIN_SEMI can be implemented by unique-ifying the inner side and then
>> doing a regular inner join; that's not a code path we wish to trigger in
>> these situations.  The patch might avoid tripping over these hazards as it
>> stands, but it seems fragile, and third-party FDWs could easily contain
>> code that'll be broken.  So I'm starting to feel that we'd better invent
>> two new JoinTypes after all, to make sure we can distinguish plain-join-
>> with-inner-side-known-unique from a real SEMI join when we need to.
>>
>> What's your thoughts on these matters?
>
> I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before,
> but that was for EXPLAIN reasons. So I do think it's better to have
> JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that
> change, but unsure on how EXPLAIN will display these now. I'd need to
> pull out my tests if we don't have anything to show in EXPLAIN, and
> I'd really rather have tests for this.

I've attached an updated patch which introduces JOIN_INNER_UNIQUE and
JOIN_LEFT_UNIQUE. So unique inner joins no longer borrow JOIN_SEMI.

I also made some changes around the is_unique_join flag in
SpecialJoinInfo. I've changed this to become optimal_jointype, which
is initially set to jointype and updated by what used to be called
mark_unique_joins(), now called optimize_outerjoin_types(). The LEFT
JOIN removal code now only bothers with special joins with
optimal_jointype as JOIN_LEFT_UNIQUE. This still saves any double work
analyzing the unique properties of LEFT JOINs. I've moved the two new
functions I put in joinpath.c into analyzejoins.c

In EXPLAIN, I named these new join types "Unique Inner" and "Unique
Left". Going by a comment in explain.c, there's some historic reason
that we display "Hash Join" rather than "Hash Inner Join", and "Nested
Loop", rather than "Nested Loop Join" or "Nested Loop Inner Join". I
wasn't quite sure if Unique Inner Joins should use a similar shorthand
form, so I didn't touch that code. We'll get "Nested Loop Unique Inner
Join" now instead of "Nested Loop". I wasn't able to think of some
equivalent shorthand form that made sense.

I know we're close to the feature freeze, so I just want to say that
I'll be AFK starting 5:30am Friday New Zealand time (13:30 on the 8th
New York time), until Sunday ~4pm. . I really hope that if this needs
any tweaking it will be minor. I'll check my email before I leave on
Friday in the hope that if there's anything, then I can quickly fix it
up before I go, but time will be short.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
Next
From: Robert Haas
Date:
Subject: Re: Move PinBuffer and UnpinBuffer to atomics