Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error |
Date | |
Msg-id | 809743.1756149367@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error
|
List | pgsql-hackers |
I wrote: > Yeah. I think this is an oversight in create_unique_paths(): it's > building an ORDER BY list without consideration for the possibility > that some of the entries are known to be constant. In fact, because > make_pathkeys_for_sortclauses will get rid of redundancies, this > example actually ends up with a unique_rel whose unique_pathkeys > are shorter than the unique_groupclause, which is pretty bogus. Here is a patch that fixes it that way. I like this better than Sergey's approach because it is making the plans better not worse. There are a couple loose ends yet to be dealt with: * The fact that we now avoid duplicate unique-ifications is good, but I think it breaks the test case added by 44e95b572, since that's no longer demonstrating what it set out to demonstrate. (See the delta in aggregates.out.) I'm not sure that that's a huge problem, but we probably at least ought to modify the comment on that test case. * What happens if we find that *all* the semi_rhs_exprs are known constant? We'll compute empty pathkeys and groupClause, but then what? It looks like create_final_unique_paths etc then build a useless Unique step. Maybe we should just absorb the input relation's path list as-is? regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 65f17101591..bb03ea7ccb4 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -8284,8 +8284,8 @@ generate_setop_child_grouplist(SetOperationStmt *op, List *targetlist) * the needs of the semijoin represented by sjinfo. If it is not possible * to identify how to make the data unique, NULL is returned. * - * If used at all, this is likely to be called repeatedly on the same rel; - * So we cache the result. + * If used at all, this is likely to be called repeatedly on the same rel, + * so we cache the result. */ RelOptInfo * create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo) @@ -8378,6 +8378,15 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo) * While in the loop, build the lists of SortGroupClause's that * represent the ordering for the sort-based implementation and the * grouping for the hash-based implementation. + * + * To complicate matters, some of the values to be unique-ified may be + * known redundant by the EquivalenceClass machinery (e.g., because + * they have been equated to constants). There is no need to compare + * such values during unique-ification, and indeed we had better not + * try because the Vars involved may not have propagated as high as + * the semijoin's level. We use make_pathkeys_for_sortclauses to + * detect such cases, which is a tad inefficient but it doesn't seem + * worth building specialized infrastructure for this. */ newtlist = make_tlist_from_pathtarget(rel->reltarget); nextresno = list_length(newtlist) + 1; @@ -8386,8 +8395,9 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo) { Expr *uniqexpr = lfirst(lc1); Oid in_oper = lfirst_oid(lc2); - Oid sortop = InvalidOid; + Oid sortop; TargetEntry *tle; + bool made_tle = false; tle = tlist_member(uniqexpr, newtlist); if (!tle) @@ -8398,19 +8408,21 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo) false); newtlist = lappend(newtlist, tle); nextresno++; + made_tle = true; } - if (sjinfo->semi_can_btree) + /* + * Try to build an ORDER BY list to sort the input compatibly. We + * do this for each sortable clause even when the clauses are not + * all sortable, so that we can detect clauses that are redundant + * according to the pathkey machinery. + */ + sortop = get_ordering_op_for_equality_op(in_oper, false); + if (OidIsValid(sortop)) { - /* Create an ORDER BY list to sort the input compatibly */ Oid eqop; SortGroupClause *sortcl; - sortop = get_ordering_op_for_equality_op(in_oper, false); - if (!OidIsValid(sortop)) /* shouldn't happen */ - elog(ERROR, "could not find ordering operator for equality operator %u", - in_oper); - /* * The Unique node will need equality operators. Normally * these are the same as the IN clause operators, but if those @@ -8430,7 +8442,32 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo) sortcl->nulls_first = false; sortcl->hashable = false; /* no need to make this accurate */ sortList = lappend(sortList, sortcl); + + /* + * At each step, convert the SortGroupClause list to pathkey + * form. If the just-added SortGroupClause is redundant, the + * result will be shorter than the SortGroupClause list. + */ + sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, + newtlist); + if (list_length(sortPathkeys) != list_length(sortList)) + { + /* Drop the redundant SortGroupClause */ + sortList = list_delete_last(sortList); + Assert(list_length(sortPathkeys) == list_length(sortList)); + /* Undo tlist addition, if we made one */ + if (made_tle) + { + newtlist = list_delete_last(newtlist); + nextresno--; + } + /* We need not consider this clause for hashing, either */ + continue; + } } + else if (sjinfo->semi_can_btree) /* shouldn't happen */ + elog(ERROR, "could not find ordering operator for equality operator %u", + in_oper); if (sjinfo->semi_can_hash) { @@ -8460,8 +8497,14 @@ create_unique_paths(PlannerInfo *root, RelOptInfo *rel, SpecialJoinInfo *sjinfo) } } + /* + * Done building the sortPathkeys and groupClause. But the + * sortPathkeys are bogus if not all the clauses were sortable. + */ + if (!sjinfo->semi_can_btree) + sortPathkeys = NIL; + unique_rel->reltarget = create_pathtarget(root, newtlist); - sortPathkeys = make_pathkeys_for_sortclauses(root, sortList, newtlist); } /* build unique paths based on input rel's pathlist */ diff --git a/src/test/regress/expected/aggregates.out b/src/test/regress/expected/aggregates.out index 7319945ffe3..46cff08b748 100644 --- a/src/test/regress/expected/aggregates.out +++ b/src/test/regress/expected/aggregates.out @@ -3405,15 +3405,15 @@ set enable_memoize to off; explain (costs off) select 1 from tenk1 where (hundred, thousand) in (select twothousand, twothousand from onek); - QUERY PLAN -------------------------------------------------------------- + QUERY PLAN +------------------------------------------------- Hash Join Hash Cond: (tenk1.hundred = onek.twothousand) -> Seq Scan on tenk1 Filter: (hundred = thousand) -> Hash -> HashAggregate - Group Key: onek.twothousand, onek.twothousand + Group Key: onek.twothousand -> Seq Scan on onek (8 rows) diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 98b05c94a11..054bcb20596 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -6500,6 +6500,68 @@ where t1.a = s.c; ---------- (0 rows) +rollback; +-- check handling of semijoins after join removal: we must suppress +-- unique-ification of known-constant values +begin; +create temp table t (a int unique, b int); +insert into t values (1, 2); +explain (verbose, costs off) +select t1.a from t t1 + left join t t2 on t1.a = t2.a + join t t3 on true +where exists (select 1 from t t4 + join t t5 on t4.b = t5.b + join t t6 on t5.b = t6.b + where t1.a = t4.a and t3.a = t5.a and t4.a = 1); + QUERY PLAN +------------------------------------------------------------------------------------ + Nested Loop + Output: t1.a + Inner Unique: true + -> Nested Loop + Output: t1.a, t5.a + -> Index Only Scan using t_a_key on pg_temp.t t1 + Output: t1.a + Index Cond: (t1.a = 1) + -> HashAggregate + Output: t5.a + Group Key: t5.a + -> Hash Join + Output: t5.a + Hash Cond: (t6.b = t4.b) + -> Seq Scan on pg_temp.t t6 + Output: t6.a, t6.b + -> Hash + Output: t4.b, t5.b, t5.a + -> Hash Join + Output: t4.b, t5.b, t5.a + Inner Unique: true + Hash Cond: (t5.b = t4.b) + -> Seq Scan on pg_temp.t t5 + Output: t5.a, t5.b + -> Hash + Output: t4.b, t4.a + -> Index Scan using t_a_key on pg_temp.t t4 + Output: t4.b, t4.a + Index Cond: (t4.a = 1) + -> Index Only Scan using t_a_key on pg_temp.t t3 + Output: t3.a + Index Cond: (t3.a = t5.a) +(32 rows) + +select t1.a from t t1 + left join t t2 on t1.a = t2.a + join t t3 on true +where exists (select 1 from t t4 + join t t5 on t4.b = t5.b + join t t6 on t5.b = t6.b + where t1.a = t4.a and t3.a = t5.a and t4.a = 1); + a +--- + 1 +(1 row) + rollback; -- test cases where we can remove a join, but not a PHV computed at it begin; diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql index 5f0a475894d..817132d90de 100644 --- a/src/test/regress/sql/join.sql +++ b/src/test/regress/sql/join.sql @@ -2420,6 +2420,32 @@ where t1.a = s.c; rollback; +-- check handling of semijoins after join removal: we must suppress +-- unique-ification of known-constant values +begin; + +create temp table t (a int unique, b int); +insert into t values (1, 2); + +explain (verbose, costs off) +select t1.a from t t1 + left join t t2 on t1.a = t2.a + join t t3 on true +where exists (select 1 from t t4 + join t t5 on t4.b = t5.b + join t t6 on t5.b = t6.b + where t1.a = t4.a and t3.a = t5.a and t4.a = 1); + +select t1.a from t t1 + left join t t2 on t1.a = t2.a + join t t3 on true +where exists (select 1 from t t4 + join t t5 on t4.b = t5.b + join t t6 on t5.b = t6.b + where t1.a = t4.a and t3.a = t5.a and t4.a = 1); + +rollback; + -- test cases where we can remove a join, but not a PHV computed at it begin;
pgsql-hackers by date: