Re: [HACKERS] 200 = 199 + 1? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] 200 = 199 + 1? |
Date | |
Msg-id | 6709.1506527111@sss.pgh.pa.us Whole thread Raw |
In response to | [HACKERS] 200 = 199 + 1? (Marko Tiikkaja <marko@joh.to>) |
Responses |
Re: [HACKERS] 200 = 199 + 1?
|
List | pgsql-hackers |
Marko Tiikkaja <marko@joh.to> writes: > I just came across this very peculiar behavior: I think this is a consequence of the clamping + fallback logic in eqjoinsel_semi. The planner has no info about the inner select's result, except it can see the LIMIT clause so it uses that as a rowcount estimate. get_variable_numdistinct reports DEFAULT_NUM_DISTINCT (200) with isdefault true; but if the LIMIT is <= 200, eqjoinsel_semi's clamping logic fires: * If we clamp, we can treat nd2 as being a non-default estimate; it's not * great, maybe, but it didn't come out ofnowhere either. This is most * helpful when the inner relation is empty and consequently has no stats. and that enables the nondefault (nd2 / nd1) estimate to be used, which happens to be dead on in this case, because the estimated nd2 is in fact exact. With a larger LIMIT, we end up with the 0.5 default selectivity estimate, which is way off in this particular example. Now I'd be the first to say that the 0.5 default didn't have all that much thought put into it. One idea that occurs to me while staring at this is why don't we use DEFAULT_EQ_SEL instead? Or for that matter, maybe just bulling ahead and using nd2 even if it is default would give superior results in many cases. (Not sure whether nd1 ought to get the same treatment. At least in this example, we do know nd1 with some precision; and if we don't know either one, dividing nd2/nd1 to arrive at 1.0 doesn't sound like a good idea.) Or maybe take nd2 = size of inner rel if we want to be conservative but still do something smarter than 0.5. > I wonder if the nested loop shouldn't have some kind of a cap on its own > estimate if it's wildly off of what you'd get by multiplying the child > nodes' estimates with each other? Nonstarter I'm afraid. The join relation's size estimate is determined long before we get to a point where we could multiply the sizes of these particular child paths to arrive at the conclusion that it should be something different than what we decided originally. Adjusting the size of the nestloop result at that point would merely give it an unfair advantage over other paths for the same join relation. (I think it would also break some assumptions about paths for the same relation all giving the same number of rows, unless parameterized.) Looking at it another way, the main thing that the combination of hashagg outer path + indexscan inner path knows that eqjoinsel_semi didn't account for is that there's a unique index on foo.id. But that info is available to eqjoinsel_semi, in the sense that it's been given a nondefault estimate that nd1 is equal to the outer relation size. So the mistake that it's making is to throw up its hands and use an 0.5 selectivity estimate just because it has no info about the inner relation. I think if we'd pushed through the nd2/nd1 calculation after setting nd2 = size of inner rel, we'd end up with an estimate matching the product of these path sizes. (Caution: inadequate caffeine absorbed yet, this might be all wrong.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: