On Wed, Sep 27, 2017 at 5:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes: > I wonder if the nested loop shouldn't have some kind of a cap on its own > estimate if it's wildly off of what you'd get by multiplying the child > nodes' estimates with each other?
Nonstarter I'm afraid. The join relation's size estimate is determined long before we get to a point where we could multiply the sizes of these particular child paths to arrive at the conclusion that it should be something different than what we decided originally.
Ah hah. Thanks for the explanation, that makes sense.
Adjusting the size of the nestloop result at that point would merely give it an unfair advantage over other paths for the same join relation. (I think it would also break some assumptions about paths for the same relation all giving the same number of rows, unless parameterized.)
With the previous paragraph in mind, I would agree; it's not a very good idea.
Looking at it another way, the main thing that the combination of hashagg outer path + indexscan inner path knows that eqjoinsel_semi didn't account for is that there's a unique index on foo.id. But that info is available to eqjoinsel_semi, in the sense that it's been given a nondefault estimate that nd1 is equal to the outer relation size. So the mistake that it's making is to throw up its hands and use an 0.5 selectivity estimate just because it has no info about the inner relation. I think if we'd pushed through the nd2/nd1 calculation after setting nd2 = size of inner rel, we'd end up with an estimate matching the product of these path sizes. (Caution: inadequate caffeine absorbed yet, this might be all wrong.)