Re: updated join removal patch - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: updated join removal patch |
Date | |
Msg-id | 603c8f070909171414p5135aa58q186397232ad341e1@mail.gmail.com Whole thread Raw |
In response to | Re: updated join removal patch (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: updated join removal patch
|
List | pgsql-hackers |
On Thu, Sep 17, 2009 at 4:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Sep 15, 2009 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> Here we go again. Following Tom's advice to not insert crocks in >>>> add_path(), I've instead introduced a noopjoin path type which ignores >>>> its inner side. > > I've committed this with some revisions. As to the points discussed > earlier: Thanks, it looks nice. I wonder if we shouldn't add some documentation or regression tests, though - any thoughts? >>> * I'm not really happy with the "NoopJoinPath" representation. ... >>> A possible compromise is to use T_SubqueryScan as the pathtype, with >>> the idea that we're pretending a SubqueryScan is to be inserted and then >>> immediately optimized away. But I don't want the inner path in there >>> in any case. > >> I don't have strong feelings about it. I thought about making just a >> Noop path type, but I couldn't see any clear reason to prefer it one >> way or the other. > > I ended up using a NoOpPath struct with pathtype = T_Join, which is a > dummy superclass nodetag that never really gets instantiated. The > SubqueryScan idea looked too messy after I remembered that createplan.c > likes to switch() on the pathtype, so having two different pathtype > uses of T_SubqueryScan seemed pretty ugly. But T_Join isn't really used > anywhere, despite being nominally a executable plan type. It's pretty > much a judgment call whether that's really cleaner than using the path's > own NodeTag... Yeah. I haven't been able to wrap my head around why it's good to have two tags on some of these things. >>> * I'm quite unimpressed with the refactoring you did to make a single >>> function deal with merge clauses, hash clauses, *and* join removal >>> clauses. I think that makes it close to unmaintainable and is buying >>> little if anything speedwise. > >> You're the committer; I'm not. But I completely disagree. There >> isn't any reason at all to duplicate this logic in two separate >> places, let alone three. I'd actually be in favor of merging the >> existing two cases even if we weren't adding join removal. > > No, I still think this was a bad idea. There are *parts* of those > tests that are similar, but combining them all into one function is > just a recipe for bugs. Having read your commit, it makes more sense to me. The fact that we're now looking at innerrel->baserestrictinfo also is a pretty powerful argument for your way. >> As for a GUC, I think it would be useful to have for debugging, or in >> case of bugs. It's really another join strategy, and we have enable_* >> flags for all the others. But if you don't want it for some reason, >> I'm not in a position to twist your arm. > > I left it out for the moment. If anyone can point to a case where the > join removal logic slows planning noticeably without gaining anything, > we can reconsider. Well, the purpose of such flags is for debugging, not production. The only argument I can see for not having one for join removal versus nestloop, mergejoin, etc. is that it's pretty easy to kill removal of a particular join by tweaking the output list, whereas it is not comparably easy to get the planner to pick a different join type otherwise. >>> * Not entirely sure where to put the code that does the hard work >>> (relation_is_distinct_for). > >> Yeah, it seemed a little weird to me. For a while I was reusing some >> of the support functions that query_is_distinct_for() calls, but the >> final version doesn't. I wonder if it should just be moved to >> joinpath.c; it seems to fit in better with what is going on there. > > I ended up putting the index-specific logic in indxpath.c, which aside > from seeming reasonable made it easier to deal with expression indexes. > If we ever add another test method that doesn't depend on indexes, we > can put it in a reasonable place and have joinpath.c call that too. Sounds fine to me. I think that the next project in this area should be to try to support removal of INNER joins. (Removal of SEMI, ANTI, and FULL joins seems unlikely ever to be interesting.) That will require teaching the planner about foreign key constraints, which interestingly opens up some other interesting optimization opportunities. An INNER join that can never match zero rows can alternatively be implemented as a LEFT join, which can be reordered in some situations where an inner join can't. e.g. A LJ (B IJ C ON Pbc) ON Pab can be implemented as (A LJ B ON Pab) LJ/IJ C ON Pbc if it is the case that for every row in B, there will be at least one row in C such that Pbc holds. Thanks again for fixing this up, and committing it. I have a TON of queries that will benefit from this, and it's one of the reasons why I started reading this mailing list on a regular basis and getting involved in development. For me, it's worth an upgrade by itself. ...Robert
pgsql-hackers by date: