Re: updated join removal patch - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: updated join removal patch |
Date | |
Msg-id | 3835.1253221153@sss.pgh.pa.us Whole thread Raw |
In response to | Re: updated join removal patch (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: updated join removal patch
|
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Sep 15, 2009 at 9:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Here we go again. �Following Tom's advice to not insert crocks in >>> add_path(), I've instead introduced a noopjoin path type which ignores >>> its inner side. I've committed this with some revisions. As to the points discussed earlier: >> * I'm not really happy with the "NoopJoinPath" representation. ... >> A possible compromise is to use T_SubqueryScan as the pathtype, with >> the idea that we're pretending a SubqueryScan is to be inserted and then >> immediately optimized away. �But I don't want the inner path in there >> in any case. > I don't have strong feelings about it. I thought about making just a > Noop path type, but I couldn't see any clear reason to prefer it one > way or the other. I ended up using a NoOpPath struct with pathtype = T_Join, which is a dummy superclass nodetag that never really gets instantiated. The SubqueryScan idea looked too messy after I remembered that createplan.c likes to switch() on the pathtype, so having two different pathtype uses of T_SubqueryScan seemed pretty ugly. But T_Join isn't really used anywhere, despite being nominally a executable plan type. It's pretty much a judgment call whether that's really cleaner than using the path's own NodeTag... >> * I'm quite unimpressed with the refactoring you did to make a single >> function deal with merge clauses, hash clauses, *and* join removal >> clauses. �I think that makes it close to unmaintainable and is buying >> little if anything speedwise. > You're the committer; I'm not. But I completely disagree. There > isn't any reason at all to duplicate this logic in two separate > places, let alone three. I'd actually be in favor of merging the > existing two cases even if we weren't adding join removal. No, I still think this was a bad idea. There are *parts* of those tests that are similar, but combining them all into one function is just a recipe for bugs. > As for a GUC, I think it would be useful to have for debugging, or in > case of bugs. It's really another join strategy, and we have enable_* > flags for all the others. But if you don't want it for some reason, > I'm not in a position to twist your arm. I left it out for the moment. If anyone can point to a case where the join removal logic slows planning noticeably without gaining anything, we can reconsider. >> * Not entirely sure where to put the code that does the hard work >> (relation_is_distinct_for). > Yeah, it seemed a little weird to me. For a while I was reusing some > of the support functions that query_is_distinct_for() calls, but the > final version doesn't. I wonder if it should just be moved to > joinpath.c; it seems to fit in better with what is going on there. I ended up putting the index-specific logic in indxpath.c, which aside from seeming reasonable made it easier to deal with expression indexes. If we ever add another test method that doesn't depend on indexes, we can put it in a reasonable place and have joinpath.c call that too. regards, tom lane
pgsql-hackers by date: