Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware) - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware) |
Date | |
Msg-id | 999171.1677611407@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware) |
List | pgsql-bugs |
I wrote: > I have a WIP patch that does it like this, and it fixes the presented > case; but it's not complete so it's still failing some existing > regression tests. More later. Here's said patch. Although this fixes the described problem and passes check-world, I'm not totally happy with it yet: it feels like the new add_outer_joins_to_relids() function is too expensive to be doing every time we construct a join relation. I wonder if there is a way we can add more info to the SpecialJoinInfo data structure to make it cheaper. An obvious improvement is to store commute_below_l explicitly instead of recomputing it over and over, but that isn't going to move the needle all that far. Is there a way to not have to scan all the SpecialJoinInfos? I may be worrying over nothing --- it's likely that the path construction work that will happen after we make the join relation RelOptInfo will swamp this cost anyway. But it feels ugly. Another thing I'm wondering about is whether this'd let us get rid of RestrictInfo.is_pushed_down and RINFO_IS_PUSHED_DOWN. I tried really hard to remove those in favor of checking to see whether a qual clause's required_relids include the outer join being formed, which seems like a much more principled way of deciding whether it's a filter or join clause. I couldn't get that to work, but now I wonder if what I was running into was really bugs associated with not understanding that we haven't fully formed the lower outer join when we apply identity 3. regards, tom lane diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 227278eb6c..d017f3555a 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -410,6 +410,8 @@ joins. However, when building Vars representing the outputs of join relations, we need to ensure that their varnullingrels are set to values consistent with the syntactic join order, so that they will appear equal() to pre-existing Vars in the upper part of the query. +(We also have to be careful about where we place qual clauses using +such Vars, as described below.) Outer joins also complicate handling of subquery pull-up. Consider @@ -507,6 +509,25 @@ problem for join relation identification either, since whether a semijoin has been completed is again implicit in the set of base relations included in the join. +As usual, outer join identity 3 complicates matters. If we start with + (A leftjoin B on (Pab)) leftjoin C on (Pbc) +then the parser will have marked any C Vars appearing above these joins +with the RT index of the B/C join. If we now transform to + A leftjoin (B leftjoin C on (Pbc)) on (Pab) +then it would appear that a clause using only such Vars could be pushed +down and applied as a filter clause (not a join clause) at the lower +B/C join. But *this might not give the right answer* since the clause +might see a non-null value for the C Var that will be replaced by null +once the A/B join is performed. We handle this by saying that the +pushed-down join hasn't completely performed the work of the B/C join +and hence is not entitled to include that outer join relid in its +relid set. When we form the A/B join, both outer joins' relids will +be added to its relid set, and then the upper clause will be applied +at the correct join level. (Note there is no problem when identity 3 +is applied in the other direction: if we started with the second form +then upper C Vars are marked with both outer join relids, so they +cannot drop below whichever join is applied second.) + There is one additional complication for qual clause placement, which occurs when we have made multiple versions of an outer-join clause as described previously (that is, we have both "Pbc" and "Pb*c" forms of diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 9949d5b1d3..ecb1343d1a 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -1366,19 +1366,20 @@ generate_base_implied_equalities_broken(PlannerInfo *root, * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but * we already have "b.y = a.x", we return the existing clause. * - * If we are considering an outer join, ojrelid is the associated OJ relid, - * otherwise it's zero. + * If we are considering an outer join, sjinfo is the associated OJ info, + * otherwise it can be NULL. * * join_relids should always equal bms_union(outer_relids, inner_rel->relids) - * plus ojrelid if that's not zero. We could simplify this function's API by - * computing it internally, but most callers have the value at hand anyway. + * plus whatever add_outer_joins_to_relids() would add. We could simplify + * this function's API by computing it internally, but most callers have the + * value at hand anyway. */ List * generate_join_implied_equalities(PlannerInfo *root, Relids join_relids, Relids outer_relids, RelOptInfo *inner_rel, - Index ojrelid) + SpecialJoinInfo *sjinfo) { List *result = NIL; Relids inner_relids = inner_rel->relids; @@ -1396,8 +1397,9 @@ generate_join_implied_equalities(PlannerInfo *root, nominal_inner_relids = inner_rel->top_parent_relids; /* ECs will be marked with the parent's relid, not the child's */ nominal_join_relids = bms_union(outer_relids, nominal_inner_relids); - if (ojrelid != 0) - nominal_join_relids = bms_add_member(nominal_join_relids, ojrelid); + nominal_join_relids = add_outer_joins_to_relids(root, + nominal_join_relids, + sjinfo); } else { @@ -1418,7 +1420,7 @@ generate_join_implied_equalities(PlannerInfo *root, * At inner joins, we can be smarter: only consider eclasses mentioning * both input rels. */ - if (ojrelid != 0) + if (sjinfo && sjinfo->ojrelid != 0) matching_ecs = get_eclass_indexes_for_relids(root, nominal_join_relids); else matching_ecs = get_common_eclass_indexes(root, nominal_inner_relids, @@ -1467,7 +1469,7 @@ generate_join_implied_equalities(PlannerInfo *root, * generate_join_implied_equalities_for_ecs * As above, but consider only the listed ECs. * - * For the sole current caller, we can assume ojrelid == 0, that is we are + * For the sole current caller, we can assume sjinfo == NULL, that is we are * not interested in outer-join filter clauses. This might need to change * in future. */ diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c index 011a0337da..64dfddea2a 100644 --- a/src/backend/optimizer/path/indxpath.c +++ b/src/backend/optimizer/path/indxpath.c @@ -3367,7 +3367,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel) otherrels), otherrels, rel, - 0)); + NULL)); /* * Normally we remove quals that are implied by a partial index's diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index d7cb11c851..be7865cc0a 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -710,9 +710,8 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return NULL; } - /* If we have an outer join, add its RTI to form the canonical relids. */ - if (sjinfo && sjinfo->ojrelid != 0) - joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid); + /* Add outer join relid(s) to form the canonical relids. */ + joinrelids = add_outer_joins_to_relids(root, joinrelids, sjinfo); /* Swap rels if needed to match the join info. */ if (reversed) @@ -775,6 +774,102 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return joinrel; } +/* + * XXX hack, if we need this we should probably be retaining commute_below_l + * as a field... + */ +static bool +commute_below_l_is_subset(SpecialJoinInfo *sjinfo, Relids relids) +{ + if (sjinfo->commute_below == NULL) /* easy case */ + return true; + else + { + Relids commute_below_l = bms_intersect(sjinfo->commute_below, + sjinfo->syn_lefthand); + bool result; + + result = bms_is_subset(commute_below_l, relids); + bms_free(commute_below_l); + return result; + } +} + +/* + * add_outer_joins_to_relids + * Add relids to input_relids to represent any outer joins that will be + * calculated at this join. + * + * input_relids is the union of the relid sets of the two input relations. + * Note that we modify this in-place and return it; caller must bms_copy() + * it first, if a separate value is desired. + * + * sjinfo represents the join being performed. + */ +Relids +add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids, + SpecialJoinInfo *sjinfo) +{ + /* Nothing to do if this isn't an outer join with an assigned relid. */ + if (sjinfo == NULL || sjinfo->ojrelid == 0) + return input_relids; + + /* + * If it's not a left join, we have no rules that would permit executing + * it in non-syntactic order, so just form the syntactic relid set. (This + * is just a quick-exit test; we'd come to the same conclusion anyway, + * since its commute_below and commute_above_l sets must be empty.) + */ + if (sjinfo->jointype != JOIN_LEFT) + return bms_add_member(input_relids, sjinfo->ojrelid); + + /* + * Add the OJ relid unless this join has been pushed into the RHS of a + * syntactically-lower left join per OJ identity 3. (If it has, then we + * cannot claim that its outputs represent the final state of its RHS.) + */ + if (commute_below_l_is_subset(sjinfo, input_relids)) + input_relids = bms_add_member(input_relids, sjinfo->ojrelid); + + /* + * Contrariwise, if we are now forming the final result of such a commuted + * pair of OJs, it's time to add the relid(s) of the pushed-down join(s). + * We can skip this if this join was never a candidate to be pushed up. + */ + if (sjinfo->commute_above_l) + { + ListCell *lc; + + /* + * The current join could complete the nulling of more than one + * pushed-down join, so we have to examine all the SpecialJoinInfos. + * Because join_info_list was built in bottom-up order, it's + * sufficient to traverse it once: an ojrelid we add in one loop + * iteration would not have affected decisions of earlier iterations. + */ + foreach(lc, root->join_info_list) + { + SpecialJoinInfo *othersj = (SpecialJoinInfo *) lfirst(lc); + + if (othersj == sjinfo || + othersj->ojrelid == 0 || othersj->jointype != JOIN_LEFT) + continue; /* definitely not interesting */ + + if (othersj->commute_below == NULL) + continue; /* was never a candidate to be pushed down */ + + /* Add it if not already present but conditions now satisfied */ + if (!bms_is_member(othersj->ojrelid, input_relids) && + bms_is_subset(othersj->min_lefthand, input_relids) && + bms_is_subset(othersj->min_righthand, input_relids) && + commute_below_l_is_subset(othersj, input_relids)) + input_relids = bms_add_member(input_relids, othersj->ojrelid); + } + } + + return input_relids; +} + /* * populate_joinrel_with_paths * Add paths to the given joinrel for given pair of joining relations. The @@ -1531,9 +1626,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, /* Build correct join relids for child join */ child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids); - if (child_sjinfo->ojrelid != 0) - child_joinrelids = bms_add_member(child_joinrelids, - child_sjinfo->ojrelid); + child_joinrelids = add_outer_joins_to_relids(root, child_joinrelids, + child_sjinfo); /* Find the AppendRelInfo structures */ appinfos = find_appinfos_by_relids(root, child_joinrelids, &nappinfos); diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 98cf3494e6..e4bb0847ce 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -88,8 +88,11 @@ restart: */ innerrelid = bms_singleton_member(sjinfo->min_righthand); - /* Compute the relid set for the join we are considering */ - joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + /* + * Compute the relid set for the join we are considering. We can + * assume things are done in syntactic order. + */ + joinrelids = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand); if (sjinfo->ojrelid != 0) joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid); @@ -201,8 +204,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo) if (!rel_supports_distinctness(root, innerrel)) return false; - /* Compute the relid set for the join we are considering */ - inputrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand); + /* Compute the syntactic relid set for the join we are considering */ + inputrelids = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand); Assert(sjinfo->ojrelid != 0); joinrelids = bms_copy(inputrelids); joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid); @@ -670,7 +673,7 @@ reduce_unique_semijoins(PlannerInfo *root) joinrelids, sjinfo->min_lefthand, innerrel, - 0), + NULL), innerrel->joininfo); /* Test whether the innerrel is unique for those clauses. */ diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 9ad44a0508..13e568b414 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -872,8 +872,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, joinrel->reloptkind = RELOPT_OTHER_JOINREL; joinrel->relids = bms_union(outer_rel->relids, inner_rel->relids); - if (sjinfo->ojrelid != 0) - joinrel->relids = bms_add_member(joinrel->relids, sjinfo->ojrelid); + joinrel->relids = add_outer_joins_to_relids(root, joinrel->relids, sjinfo); joinrel->rows = 0; /* cheap startup cost is interesting iff not all tuples to be retrieved */ joinrel->consider_startup = (root->tuple_fraction > 0); @@ -1266,7 +1265,7 @@ build_joinrel_restrictlist(PlannerInfo *root, joinrel->relids, outer_rel->relids, inner_rel, - sjinfo->ojrelid)); + sjinfo)); return result; } @@ -1550,7 +1549,7 @@ get_baserel_parampathinfo(PlannerInfo *root, RelOptInfo *baserel, joinrelids, required_outer, baserel, - 0)); + NULL)); /* Compute set of serial numbers of the enforced clauses */ pserials = NULL; @@ -1673,7 +1672,7 @@ get_joinrel_parampathinfo(PlannerInfo *root, RelOptInfo *joinrel, join_and_req, required_outer, joinrel, - 0); + NULL); /* We only want ones that aren't movable to lower levels */ dropped_ecs = NIL; foreach(lc, eclauses) diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 5b9db7733d..d9e1623274 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -104,6 +104,8 @@ extern void add_paths_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel, extern void join_search_one_level(PlannerInfo *root, int level); extern RelOptInfo *make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2); +extern Relids add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids, + SpecialJoinInfo *sjinfo); extern bool have_join_order_restriction(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2); extern bool have_dangerous_phv(PlannerInfo *root, @@ -150,7 +152,7 @@ extern List *generate_join_implied_equalities(PlannerInfo *root, Relids join_relids, Relids outer_relids, RelOptInfo *inner_rel, - Index ojrelid); + SpecialJoinInfo *sjinfo); extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root, List *eclasses, Relids join_relids, diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index e293de03c0..73edfc4bd0 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -2357,6 +2357,40 @@ where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous; ----+----+----------+---------- (0 rows) +-- +-- check for correct placement of non-strict qual in a multiway outer join +-- +explain (costs off) +select t1.f1 +from int4_tbl t1, int4_tbl t2 + left join int4_tbl t3 on t3.f1 > 0 + left join int4_tbl t4 on t3.f1 > 1 +where t4.f1 is null; + QUERY PLAN +------------------------------------------------------- + Nested Loop + -> Nested Loop Left Join + Filter: (t4.f1 IS NULL) + -> Seq Scan on int4_tbl t2 + -> Materialize + -> Nested Loop Left Join + Join Filter: (t3.f1 > 1) + -> Seq Scan on int4_tbl t3 + Filter: (f1 > 0) + -> Materialize + -> Seq Scan on int4_tbl t4 + -> Seq Scan on int4_tbl t1 +(12 rows) + +select t1.f1 +from int4_tbl t1, int4_tbl t2 + left join int4_tbl t3 on t3.f1 > 0 + left join int4_tbl t4 on t3.f1 > 1 +where t4.f1 is null; + f1 +---- +(0 rows) + -- -- check a case where we formerly got confused by conflicting sort orders -- in redundant merge join path keys diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql index 5c0328ed76..bbb1b7d68f 100644 --- a/src/test/regress/sql/join.sql +++ b/src/test/regress/sql/join.sql @@ -441,6 +441,22 @@ select a.f1, b.f1, t.thousand, t.tenthous from (select sum(f1) as f1 from int4_tbl i4b) b where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous; +-- +-- check for correct placement of non-strict qual in a multiway outer join +-- +explain (costs off) +select t1.f1 +from int4_tbl t1, int4_tbl t2 + left join int4_tbl t3 on t3.f1 > 0 + left join int4_tbl t4 on t3.f1 > 1 +where t4.f1 is null; + +select t1.f1 +from int4_tbl t1, int4_tbl t2 + left join int4_tbl t3 on t3.f1 > 0 + left join int4_tbl t4 on t3.f1 > 1 +where t4.f1 is null; + -- -- check a case where we formerly got confused by conflicting sort orders -- in redundant merge join path keys
pgsql-bugs by date: