Re: Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Parameterized-path cost comparisons need some work |
Date | |
Msg-id | 5189.1330838449@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Parameterized-path cost comparisons need some work (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parameterized-path cost comparisons need some work
Re: Parameterized-path cost comparisons need some work |
List | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Feb 29, 2012 at 6:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (... thinks some more ...) No, that doesn't get us there, because that >> doesn't establish that a more-parameterized path produces fewer rows >> than some path that requires less parameterization, yet not none at >> all. You really want add_path carrying out those comparisons. In your >> previous example, it's entirely possible that path D is dominated by B >> or C because of poor choices of join quals. > I'm not following this part. Can you explain further? It seems to me > at any rate that we could get pretty far if we could just separate > parameterized paths and unparameterized paths into separate buckets. To try to get some definitive info about this, I instrumented add_path to emit a log message any time it compared two paths for which one's parameterization was a strict subset of the other's, and yet the first was not estimated to return more rows. Sure enough, I got a lot of messages, just by running the regression tests (and even more with some of the other test cases I'd been using for parameterized-path testing). All of the hits were for equal numbers of rows, though -- there were no cases with a rows difference in the opposite direction from expectations. After looking at the results, I think that the fallacy in what we've been discussing is this: a parameterized path may well have some extra selectivity over a less-parameterized one, but perhaps *not enough to be meaningful*. The cases I was getting hits on were where the rowcount estimate got rounded off to be the same as for the less-parameterized path. (In this connection it's worth noting that most of the hits were for rowcount estimates of only 1 or 2 rows.) So basically, the scenario is where you have restriction clauses that are already enough to get down to a small number of rows retrieved, and then you have some join clauses that are not very selective and don't reduce the rowcount any further. Or maybe you have some nicely selective join clauses, and then adding more joins to some other relations doesn't help any further. In situations like this, we want add_path to reject the ineffective more-parameterized path as not being an improvement over the less-parameterized path. Not having it do so might save cycles in add_path itself, but we'd be being penny-wise and pound-foolish, because not getting rid of the useless paths will cost us a lot more at the next join level up. So I'm back to thinking we need to look explicitly at the rowcount comparison as well as all the existing conditions in add_path. One annoying thing about that is that it will reduce the usefulness of add_path_precheck, because that's called before we compute the rowcount estimates (and indeed not having to make the rowcount estimates is one of the major savings from the precheck). I think what we'll have to do is assume that a difference in parameterization could result in a difference in rowcount, and hence only a dominant path with exactly the same parameterization can result in failing the precheck. regards, tom lane
pgsql-hackers by date: