Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Performance improvement for joins where outer side is unique |
Date | |
Msg-id | CAKJS1f_BxfNe3ixfz4S-g2RTodvjk82edcv9O9WtBqc9eUbWSA@mail.gmail.com Whole thread Raw |
In response to | Re: Performance improvement for joins where outer side is unique (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Performance improvement for joins where outer side is unique
Re: Performance improvement for joins where outer side is unique |
List | pgsql-hackers |
On 7 April 2016 at 08:01, David Rowley <david.rowley@2ndquadrant.com> wrote: > On 7 April 2016 at 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Starting to look at this again. I wonder, now that you have the generic >> caching mechanism for remembering whether join inner sides have been >> proven unique, is it still worth having the is_unique_join field in >> SpecialJoinInfo? It seems like that's creating a separate code path >> for special joins vs. inner joins that may not be buying us much. >> It does potentially save lookups in the unique_rels cache, if you already >> have the SpecialJoinInfo at hand, but I'm not sure what that's worth. > > I quite like that field where it is, as it should make > remove_useless_joins() a bit more efficient, as after a LEFT JOIN is > removed, the previous code would go off and try to make sure all the > joins are unique again, but now we cache that, and save it from having > to bother doing that again, on joins already marked as unique. > > Certainly changing that would mean one less special case in > joinpath.c, as the JOIN_LEFT case can be handle the same as the other > cases, although it looks like probably, if I do change that, then I'd > probably move is_innerrel_unique_for() into analyzejoins.c, and put > the special case for JOIN_LEFT in that function, so that it calls > specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in > the unique_rels cache if the result comes back positive, and in the > non_unique_rels cache if negative... But it seems a bit crazy to go to > the trouble or all that caching, when we can just throw the result in > a struct field in the case of Special Joins. Maybe we could just hide > both the new joinpath.c functions in analyzejoins.c and call it quits. > It's not as if there's no special cases for JOIN_LEFT in that file. > >> Also, as I'm looking around at the planner some more, I'm beginning to get >> uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far >> as the executor is concerned, no doubt, but there's a lot of planner >> expectations about the use of JOIN_SEMI that we'd be violating. One is >> that there should be a SpecialJoinInfo for any SEMI join. Another is that >> JOIN_SEMI can be implemented by unique-ifying the inner side and then >> doing a regular inner join; that's not a code path we wish to trigger in >> these situations. The patch might avoid tripping over these hazards as it >> stands, but it seems fragile, and third-party FDWs could easily contain >> code that'll be broken. So I'm starting to feel that we'd better invent >> two new JoinTypes after all, to make sure we can distinguish plain-join- >> with-inner-side-known-unique from a real SEMI join when we need to. >> >> What's your thoughts on these matters? > > I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before, > but that was for EXPLAIN reasons. So I do think it's better to have > JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that > change, but unsure on how EXPLAIN will display these now. I'd need to > pull out my tests if we don't have anything to show in EXPLAIN, and > I'd really rather have tests for this. I've attached an updated patch which introduces JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE. So unique inner joins no longer borrow JOIN_SEMI. I also made some changes around the is_unique_join flag in SpecialJoinInfo. I've changed this to become optimal_jointype, which is initially set to jointype and updated by what used to be called mark_unique_joins(), now called optimize_outerjoin_types(). The LEFT JOIN removal code now only bothers with special joins with optimal_jointype as JOIN_LEFT_UNIQUE. This still saves any double work analyzing the unique properties of LEFT JOINs. I've moved the two new functions I put in joinpath.c into analyzejoins.c In EXPLAIN, I named these new join types "Unique Inner" and "Unique Left". Going by a comment in explain.c, there's some historic reason that we display "Hash Join" rather than "Hash Inner Join", and "Nested Loop", rather than "Nested Loop Join" or "Nested Loop Inner Join". I wasn't quite sure if Unique Inner Joins should use a similar shorthand form, so I didn't touch that code. We'll get "Nested Loop Unique Inner Join" now instead of "Nested Loop". I wasn't able to think of some equivalent shorthand form that made sense. I know we're close to the feature freeze, so I just want to say that I'll be AFK starting 5:30am Friday New Zealand time (13:30 on the 8th New York time), until Sunday ~4pm. . I really hope that if this needs any tweaking it will be minor. I'll check my email before I leave on Friday in the hope that if there's anything, then I can quickly fix it up before I go, but time will be short. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: