Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Performance improvement for joins where outer side is unique |
Date | |
Msg-id | CAKJS1f9OrAkVywOvk5OhCoXOssGUmtYOQyhU8jFr_bZL+Xn6zQ@mail.gmail.com Whole thread Raw |
In response to | Re: Performance improvement for joins where outer side is unique (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Performance improvement for joins where outer side is unique
Re: Performance improvement for joins where outer side is unique |
List | pgsql-hackers |
On 7 April 2016 at 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Rowley <david.rowley@2ndquadrant.com> writes: >> In the last patch I failed to notice that there's an alternative >> expected results file for one of the regression tests. >> The attached patch includes the fix to update that file to match the >> new expected EXPLAIN output. > > Starting to look at this again. I wonder, now that you have the generic > caching mechanism for remembering whether join inner sides have been > proven unique, is it still worth having the is_unique_join field in > SpecialJoinInfo? It seems like that's creating a separate code path > for special joins vs. inner joins that may not be buying us much. > It does potentially save lookups in the unique_rels cache, if you already > have the SpecialJoinInfo at hand, but I'm not sure what that's worth. I quite like that field where it is, as it should make remove_useless_joins() a bit more efficient, as after a LEFT JOIN is removed, the previous code would go off and try to make sure all the joins are unique again, but now we cache that, and save it from having to bother doing that again, on joins already marked as unique. Certainly changing that would mean one less special case in joinpath.c, as the JOIN_LEFT case can be handle the same as the other cases, although it looks like probably, if I do change that, then I'd probably move is_innerrel_unique_for() into analyzejoins.c, and put the special case for JOIN_LEFT in that function, so that it calls specialjoin_is_unique_join(), then cache the sjinfo->min_righthand in the unique_rels cache if the result comes back positive, and in the non_unique_rels cache if negative... But it seems a bit crazy to go to the trouble or all that caching, when we can just throw the result in a struct field in the case of Special Joins. Maybe we could just hide both the new joinpath.c functions in analyzejoins.c and call it quits. It's not as if there's no special cases for JOIN_LEFT in that file. > Also, as I'm looking around at the planner some more, I'm beginning to get > uncomfortable with the idea of using JOIN_SEMI this way. It's fine so far > as the executor is concerned, no doubt, but there's a lot of planner > expectations about the use of JOIN_SEMI that we'd be violating. One is > that there should be a SpecialJoinInfo for any SEMI join. Another is that > JOIN_SEMI can be implemented by unique-ifying the inner side and then > doing a regular inner join; that's not a code path we wish to trigger in > these situations. The patch might avoid tripping over these hazards as it > stands, but it seems fragile, and third-party FDWs could easily contain > code that'll be broken. So I'm starting to feel that we'd better invent > two new JoinTypes after all, to make sure we can distinguish plain-join- > with-inner-side-known-unique from a real SEMI join when we need to. > > What's your thoughts on these matters? I was a bit uncomfortable with JOIN_INNER becoming JOIN_SEMI before, but that was for EXPLAIN reasons. So I do think it's better to have JOIN_INNER_UNIQUE and JOIN_LEFT_UNIQUE instead. I can go make that change, but unsure on how EXPLAIN will display these now. I'd need to pull out my tests if we don't have anything to show in EXPLAIN, and I'd really rather have tests for this. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: