Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Performance improvement for joins where outer side is unique
Date
Msg-id 21641.1459958757@sss.pgh.pa.us
Whole thread Raw
In response to Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Performance improvement for joins where outer side is unique  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> In the last patch I failed to notice that there's an alternative
> expected results file for one of the regression tests.
> The attached patch includes the fix to update that file to match the
> new expected EXPLAIN output.

Starting to look at this again.  I wonder, now that you have the generic
caching mechanism for remembering whether join inner sides have been
proven unique, is it still worth having the is_unique_join field in
SpecialJoinInfo?  It seems like that's creating a separate code path
for special joins vs. inner joins that may not be buying us much.
It does potentially save lookups in the unique_rels cache, if you already
have the SpecialJoinInfo at hand, but I'm not sure what that's worth.

Also, as I'm looking around at the planner some more, I'm beginning to get
uncomfortable with the idea of using JOIN_SEMI this way.  It's fine so far
as the executor is concerned, no doubt, but there's a lot of planner
expectations about the use of JOIN_SEMI that we'd be violating.  One is
that there should be a SpecialJoinInfo for any SEMI join.  Another is that
JOIN_SEMI can be implemented by unique-ifying the inner side and then
doing a regular inner join; that's not a code path we wish to trigger in
these situations.  The patch might avoid tripping over these hazards as it
stands, but it seems fragile, and third-party FDWs could easily contain
code that'll be broken.  So I'm starting to feel that we'd better invent
two new JoinTypes after all, to make sure we can distinguish plain-join-
with-inner-side-known-unique from a real SEMI join when we need to.

What's your thoughts on these matters?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Proposal: Generic WAL logical messages
Next
From: Robert Haas
Date:
Subject: Re: Truncating/vacuuming relations on full tablespaces