Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE |
Date | |
Msg-id | 4905.1542993454@sss.pgh.pa.us Whole thread Raw |
In response to | Re: BUG #15160: planner overestimates number of rows in join whenthere are more than 200 rows coming from CTE (Melanie Plageman <melanieplageman@gmail.com>) |
List | pgsql-hackers |
Melanie Plageman <melanieplageman@gmail.com> writes: > Given that you have addressed all of my feedback and that it's a pretty > low-risk change, I will change the status to "ready for committer". Thanks for reviewing! > Has there been discussion in the past about adding a planner test > extension similar to those in src/test/modules for cardinality > estimation? I am imagining something that is a "soft" check that either > the rows estimation that comes out of calc_joinrel_size_estimate is > within an expected range (given differing estimates across machines) or > that the selectivity estimate that comes out of eqjoinsel is within an > expected range. The former seems relatively easy to do in a manner > similar to the test_predtest extension and the latter seems like it > could be done even more trivially. No, I don't recall any discussion about that. The regression tests in general embody a lot of checking that the planner makes expected plan choices: obviously the cases where we do an explicit EXPLAIN do that, but even where we don't, we'd be likely to get artifacts such as varying row order if an unexpected plan were chosen. Perhaps there's a use-case for a lower-level test harness such as you suggest, but I haven't really felt a need for it. > On Sat, Nov 17, 2018 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (Here, both "outer rel's size" and "inner rel's size" mean the size after >> earlier filtering steps.) So that's why we only clamp nd2 and only do so >> in eqjoinsel_semi: in the other three cases, we'd be double-counting the >> selectivity of earlier filters if we did that. > I just want to make sure I am understanding what the comment is saying: So, > after we calculate the selectivity for inner join, when we return from > calc_joinrel_size_estimate we do this math: > nrows = outer_rows * inner_rows * fkselec * jselec; > and in that equation, the outer and inner rows have been adjusted to account > for any restrictions on the tables, so we don't clamp the ndvs for inner > join in eqjoinsel_inner. However, for semi-join, that calculation is > nrows = outer_rows * fkselec * jselec; > Which means that we have to adjust the rows of the inner side before we get > here? Yeah. Basically the point is that if we have some WHERE clause that eliminates rows from the inner side of a semijoin, we can expect that that means the size of the semijoin result will be smaller than if the WHERE clause hadn't been there --- because some of the outer-rel rows only had matches among those excluded rows. But the equation in calc_joinrel_size_estimate provides no way to factor that in, except by adjusting the selectivity value, so that's what we do. > You wrote: >> Hm. Maybe the "Psemi" and "Pinner" notation is not helpful ... would >> "Ssemi" and "Sinner" be better? > I think Ssemi and Sinner might be more clear--mostly because we haven't used > P/predicate here or in most of the other selectivity estimation comments > that I > read. Also, in some cases when we have super limited information and make a > guess, the selectivity feels pretty detached from the join predicate. OK, thanks --- I'll have another go at writing that comment. regards, tom lane
pgsql-hackers by date: