Thread: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Hi all, Monday RhodiumToad/Andrew Gierth and I tried to debug a plan of raptelan (CCed) getting rather strange plans. After trawling through some unrelated stuff we diagnosed that the problem were some rather strange estimates. I managed to extract a reproducable, simple testcase. Our analysis was that a recent backported change causes the strange estimates: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3f5d2fe3029b181fe773a02f1d4b34624c357634 The problem lies in eqjoinsel_semi's behaviour if it doesn't find MCVs and cannot rely on the ndistinct estimate of a lower node. If one of both sides of a semijoin doesn't have a sensible estimate it just assumes a selectivity of 0.5 which will often overestimate That change is pretty bad because - as seen in the example below - it leads to absurd rowcounts. The approach I quickly tried was to use the underlying relations rows as substitute ndistinct estimate. For those examples that seems to work rather well. Very likely its sensible though to check whether those values actually make sense before really using them ;) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index da638f8..7117978 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -2471,15 +2471,22 @@ eqjoinsel_semi(Oid operator, */ double nullfrac1 = stats1 ? stats1->stanullfrac : 0.0; - if (!isdefault1 && !isdefault2) + if (isdefault1 && isdefault2) { + selec = 0.5 * (1.0 - nullfrac1); + } + else{ + if(isdefault1) + nd1 = Max(nd1, vardata1->rel->rows * (1.0 - nullfrac1)); + + if(isdefault2) + nd2 = Max(nd2, vardata2->rel->rows); + if (nd1 <= nd2 || nd2 < 0) selec = 1.0 - nullfrac1; else selec = (nd2 / nd1) * (1.0 - nullfrac1); } - else - selec = 0.5 * (1.0 - nullfrac1); } if (have_mcvs1) Whats your opinion on this? Andres Testcase: /* test_raptelan=# EXPLAIN SELECT * FROM a WHERE a.id IN (SELECT DISTINCT id FROM b WHERE id < 5000); QUERY PLAN -------------------------------------------------------------------------------------------- Hash Join (cost=315.13..2782.56 rows=50000 width=9) Hash Cond: (a.id = b.id) -> Seq Scan on a (cost=0.00..1540.00 rows=100000 width=9) -> Hash (cost=249.59..249.59 rows=5243 width=4) -> Unique (cost=0.00..197.16 rows=5243 width=4) -> Index Only Scan using b_pkey on b (cost=0.00..184.06 rows=5243 width=4) Index Cond: (id < 5000) (7 rows) Time: 4.016 ms test_raptelan=# EXPLAIN WITH foo AS (SELECT * FROM b WHERE id < 5000) SELECT * FROM a WHERE a.id IN (SELECT id FROM foo); QUERY PLAN ------------------------------------------------------------------------------ Nested Loop (cost=302.02..486.74 rows=50000 width=9) CTE foo -> Index Scan using b_pkey on b (cost=0.00..184.06 rows=5243 width=10) Index Cond: (id < 5000) -> HashAggregate (cost=117.97..119.97 rows=200 width=4) -> CTE Scan on foo (cost=0.00..104.86 rows=5243 width=4) -> Index Scan using a_pkey on a (cost=0.00..0.90 rows=1 width=9) Index Cond: (id = foo.id) (8 rows) Time: 2.636 ms test_raptelan=# EXPLAIN SELECT * FROM a WHERE a.id IN (SELECT id FROM b WHERE id < 5000); QUERY PLAN -------------------------------------------------------------------------------------- Hash Semi Join (cost=249.59..2184.02 rows=5243 width=9) Hash Cond: (a.id = b.id) -> Seq Scan on a (cost=0.00..1540.00 rows=100000 width=9) -> Hash (cost=184.06..184.06 rows=5243 width=4) -> Index Only Scan using b_pkey on b (cost=0.00..184.06 rows=5243 width=4) Index Cond: (id < 5000) (6 rows) Time: 2.459 ms */
Andres Freund <andres@anarazel.de> writes: > Whats your opinion on this? Looks pretty bogus to me. You're essentially assuming that the side of the join without statistics is unique, which is a mighty dubious assumption. (In cases where we *know* it's unique, something like this could be reasonable, but I believe get_variable_numdistinct already accounts for such cases.) The reason for the reversion to pre-8.4 behavior was that with the other behavior, we might sometimes make extremely optimistic estimates (ie, conclude that the join result is very small) on the basis of, really, nothing at all. AFAICS this proposal just reintroduces unwarranted assumptions, and therefore will probably produce as many worse results as better ones. Also, why the asymmetry in null handling? And why did you only touch one of the two code paths in eqjoinsel_semi? They have both got this issue of how to estimate with inadequate stats. regards, tom lane
Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
From
Andres Freund
Date:
On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > Also, why the asymmetry in null handling? And why did you only touch > one of the two code paths in eqjoinsel_semi? They have both got this > issue of how to estimate with inadequate stats. This patch was purely a proof of concept, sorry if that wasn't clear. I mostly wanted to point out that real plans regressed with this change. Digging a bit around I could find more examples where it caused real pain. But also cases were the new behaviour was advantageous. Unfortunately the pastebins where raptelan provided plans expired by now... Perhaps he can provide them again? If we aggree on a way to handle the stats I am happy to produce a patch that actually tries to cover all the cases. > > Whats your opinion on this? > Looks pretty bogus to me. You're essentially assuming that the side of > the join without statistics is unique, which is a mighty dubious > assumption. It sure is a bit dubious. But assuming that a semijoin that has max of n rows on the inner side results in half of the outer sides rows (>> n) is pretty bogus as well. Using the asumption of uniqueness for the outer side seems sensible if its only used as a upper limit (Except in an antijoin ...). Yes, my "patch" didn't even start to do this ;) SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); is not exactly a fringe case, so I find it problematic regressing quite a bit in the estimates. > (In cases where we *know* it's unique, something like this > could be reasonable, but I believe get_variable_numdistinct already > accounts for such cases.) Only that we infer uniqueness only from very few things unless I miss something... Andres
Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
From
Andres Freund
Date:
On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: > (In cases where we know it's unique, something like this > could be reasonable, but I believe get_variable_numdistinct already > accounts for such cases.) One of those case which looks relatively easy is that CTEs currently work as a kind of 'statistics barrier' here. Specifically I wonder why: test_raptelan=# EXPLAIN WITH foo AS (SELECT * FROM b WHERE id < 5000) SELECT * FROM a WHERE a.id IN (SELECT id FROM foo); QUERY PLAN ------------------------------------------------------------------------------ Nested Loop (cost=302.02..1876.30 rows=2550000 width=11) CTE foo -> Index Scan using b_pkey on b (cost=0.00..184.06 rows=5243 width=10) Index Cond: (id < 5000) -> HashAggregate (cost=117.97..119.97 rows=200 width=4) -> CTE Scan on foo (cost=0.00..104.86 rows=5243 width=4) -> Index Scan using a_pkey on a (cost=0.00..7.85 rows=1 width=11) Index Cond: (id = foo.id) plans differently than test_raptelan=# EXPLAIN SELECT * FROM a WHERE a.id IN (SELECT id FROM b WHERE id < 5000 OFFSET 0); QUERY PLAN -------------------------------------------------------------------------------------------- Merge Semi Join (cost=560.41..17426.03 rows=5243 width=11) Merge Cond: (a.id = b.id) -> Index Scan using a_pkey on a (cost=0.00..160013.81 rows=5100000 width=11) -> Sort (cost=560.40..573.51 rows=5243 width=4) Sort Key: b.id -> Limit (cost=0.00..184.06 rows=5243 width=4) -> Index Only Scan using b_pkey on b (cost=0.00..184.06 rows=5243 width=4) Index Cond: (id < 5000) Couldn't the CTE pass a vardata from inside to the outside? Andres
Andres Freund <andres@anarazel.de> writes: > On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: >> Looks pretty bogus to me. You're essentially assuming that the side of >> the join without statistics is unique, which is a mighty dubious >> assumption. > It sure is a bit dubious. But assuming that a semijoin that has max of n rows > on the inner side results in half of the outer sides rows (>> n) is pretty > bogus as well. How so? There is no reason to think that the number of LHS rows with a match is limited by the number of RHS rows. If we knew that the LHS join key was unique, then yes that'd be sensible, but we don't know that. > SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); > is not exactly a fringe case, so I find it problematic regressing > quite a bit in the estimates. Agreed, and that's why I don't want to put in a patch that carries the risk of regressing even more. I'm happy to do something that's got some amount of theory behind it, but if we're just guessing, we can't afford to guess a very high or low selectivity. One thing I've considered but not done anything about is that in a lot of practical cases for this, the aggregation or grouping properties of the sub-select would provide adequate reason for assuming its output is more or less unique, so that taking ndistinct equal to number of rows actually is sane. But it would need a bit of thought about what properties we want to treat as justifying such an assumption, and then some code to see if the join key is a Var coming out of such a sub-select. (Actually, what such a patch would probably look like is modifying examine_simple_variable to not just punt when it finds the Var came from an aggregating subquery.) regards, tom lane
Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
From
Andres Freund
Date:
On Thursday, January 12, 2012 02:24:44 AM Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: > >> Looks pretty bogus to me. You're essentially assuming that the side of > >> the join without statistics is unique, which is a mighty dubious > >> assumption. > > > > It sure is a bit dubious. But assuming that a semijoin that has max of n > > rows on the inner side results in half of the outer sides rows (>> n) is > > pretty bogus as well. > > How so? There is no reason to think that the number of LHS rows with a > match is limited by the number of RHS rows. If we knew that the LHS > join key was unique, then yes that'd be sensible, but we don't know > that. In the current example we have an estimate for the distinctness of the LHS. I don't see how guesstimating the RHS number of tuples in a semijoin to vardata2->rel->rows will be worse than just assuming a selectivity of 0.5 for the whole thing. Possibly it would make sense to additionally clamp the selectivity to an upper limit of 0.5 in that case. I guess what I am aiming at is that overestimating the number of distinct tuples in the *RHS* won't lead to underestimating the number of result tuples. By clamping the selectivity to 0.5 in that case we can be sure not to overestimate more than currently. > > SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); > > is not exactly a fringe case, so I find it problematic regressing > > quite a bit in the estimates. > > Agreed, and that's why I don't want to put in a patch that carries the > risk of regressing even more. I'm happy to do something that's got some > amount of theory behind it, but if we're just guessing, we can't afford > to guess a very high or low selectivity. Not sure how more an estimation can regress than: Nested Loop (cost=0.00..1859.02 rows=2550000 width=11) (actual time=0.182..11.236 rows=199 loops=1) ;) > One thing I've considered but not done anything about is that in a lot > of practical cases for this, the aggregation or grouping properties of > the sub-select would provide adequate reason for assuming its output is > more or less unique, so that taking ndistinct equal to number of rows > actually is sane. But it would need a bit of thought about what > properties we want to treat as justifying such an assumption, and then > some code to see if the join key is a Var coming out of such a > sub-select. (Actually, what such a patch would probably look like is > modifying examine_simple_variable to not just punt when it finds the > Var came from an aggregating subquery.) Yes, having that would be great, but be a bit more invasive than I like to think right now. This thing is actually a problem for people in the field... Andres-
Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
From
Casey Allen Shobe
Date:
On Wed, 11 Jan 2012 19:40:34 -0500, Andres wrote: > Unfortunately the pastebins where raptelan provided plans expired by > now... Perhaps he can provide them again? Sure, the original curiosity I noticed was that adjusting the block size of results returned by the CTE had widely different effects, 5000 seemed to be some sort of "magic number", while either 6000 or 4000 worked poorly. Originally, our design used blocks of 100,000. I then noticed a regression with the 5,000 block size. The difference in the queries between a prototype (that was pretty fast (5-10s) and what was generated (slow, (5m+)) was that the order of columns in one of the join conditions in the main query was reversed. ON source.column = joined.column was slow, while ON joined.column = source.column was fast. Apparently this is enough to get a different possible plan to hit the planner first, while another slow plan with nearly identical estimates is hitting the planner first in other cases. Attached are three files - one shows the fast plan, another the slow plan, and another with the query in question. The ON clause where reversal makes a difference is the td_13 one. I use a different range in the CTE for both queries as otherwise filesystem cache makes the timings look better, but in both cases, the CTE returns exactly 5,000 results. Yes, the query could be written a lot better, but currently it's generated this way to conform to an expectation of user-defined custom where clauses that do not qualify column names. Breaking that compatibility and redoing this better is a longer-term plan. Please let me know if I'm omitting any important details. Regards, -- Casey Allen Shobe | Senior Software Engineer/DBA Message Systems | http://messagesystems.com casey.shobe@messagesystems.com | 443-656-3311 x248
Attachment
[ getting back to this after assorted distractions ] Andres Freund <andres@anarazel.de> writes: > On Thursday, January 12, 2012 02:24:44 AM Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >>> On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote: >>>> Looks pretty bogus to me. You're essentially assuming that the side of >>>> the join without statistics is unique, which is a mighty dubious >>>> assumption. >>> It sure is a bit dubious. But assuming that a semijoin that has max of n >>> rows on the inner side results in half of the outer sides rows (>> n) is >>> pretty bogus as well. > In the current example we have an estimate for the distinctness of the LHS. I > don't see how guesstimating the RHS number of tuples in a semijoin to > vardata2->rel->rows will be worse than just assuming a selectivity of 0.5 for > the whole thing. The real problem is that the estimate we want to use involves the ratio nd2/nd1. If either of those numbers is mere fantasy, so is the ratio. It doesn't really help to say "well, we can upper-bound this number here", because sometimes a too large result is as bad as too small. >> One thing I've considered but not done anything about is that in a lot >> of practical cases for this, the aggregation or grouping properties of >> the sub-select would provide adequate reason for assuming its output is >> more or less unique, so that taking ndistinct equal to number of rows >> actually is sane. But it would need a bit of thought about what >> properties we want to treat as justifying such an assumption, and then >> some code to see if the join key is a Var coming out of such a >> sub-select. (Actually, what such a patch would probably look like is >> modifying examine_simple_variable to not just punt when it finds the >> Var came from an aggregating subquery.) > Yes, having that would be great, but be a bit more invasive than I like to > think right now. This thing is actually a problem for people in the field... I'm not sure it's that bad. Attached is a simple patch to notice that the subquery is "SELECT DISTINCT foo" and if so assume the result is unique. This takes care of at least the first of your original examples. I'm not sure whether fixing this single case is enough to get excited about, but it's a step forward anyway. Your second example, involving a WITH, would properly be handled by teaching examine_simple_variable to drill down into CTEs. I lacked the round tuits to make it do so initially, and still do, but if you'd like to have a go at it ... regards, tom lane diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 6d78068476e520f7dd2da6c0c8d48d93e0649768..5a4fa77848b5e44abd913e536f0f83cc06bb85fb 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** *** 110,115 **** --- 110,116 ---- #include "optimizer/predtest.h" #include "optimizer/restrictinfo.h" #include "optimizer/var.h" + #include "parser/parse_clause.h" #include "parser/parse_coerce.h" #include "parser/parsetree.h" #include "utils/builtins.h" *************** examine_simple_variable(PlannerInfo *roo *** 4357,4375 **** { /* * Plain subquery (not one that was converted to an appendrel). - * - * Punt if subquery uses set operations, GROUP BY, or DISTINCT --- any - * of these will mash underlying columns' stats beyond recognition. - * (Set ops are particularly nasty; if we forged ahead, we would - * return stats relevant to only the leftmost subselect...) */ Query *subquery = rte->subquery; RelOptInfo *rel; TargetEntry *ste; if (subquery->setOperations || ! subquery->groupClause || ! subquery->distinctClause) return; /* --- 4358,4378 ---- { /* * Plain subquery (not one that was converted to an appendrel). */ Query *subquery = rte->subquery; RelOptInfo *rel; TargetEntry *ste; + /* + * Punt if subquery uses set operations or GROUP BY, as these will + * mash underlying columns' stats beyond recognition. (Set ops are + * particularly nasty; if we forged ahead, we would return stats + * relevant to only the leftmost subselect...) DISTINCT is also + * problematic, but we check that later because there is a possibility + * of learning something even with it. + */ if (subquery->setOperations || ! subquery->groupClause) return; /* *************** examine_simple_variable(PlannerInfo *roo *** 4415,4420 **** --- 4418,4437 ---- rte->eref->aliasname, var->varattno); var = (Var *) ste->expr; + /* + * If subquery uses DISTINCT, we can't make use of any stats for the + * variable ... but, if it's the only DISTINCT column, we are entitled + * to consider it unique. We do the test this way so that it works + * for cases involving DISTINCT ON. + */ + if (subquery->distinctClause) + { + if (list_length(subquery->distinctClause) == 1 && + targetIsInSortList(ste, InvalidOid, subquery->distinctClause)) + vardata->isunique = true; + return; + } + /* Can only handle a simple Var of subquery's query level */ if (var && IsA(var, Var) && var->varlevelsup == 0)