Re: Ignored join clause - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: Ignored join clause |
Date | |
Msg-id | 27781.1524241975@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Ignored join clause (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
I wrote: > So the only practical answer seems to be to teach > extract_actual_join_clauses to check the clause's syntactic level > along with is_pushed_down, as per the attached patch. After further thought about this, I decided that that patch didn't go nearly far enough: in reality, just about every single place we test RestrictInfo.is_pushed_down is potentially wrong in the same way that extract_actual_join_clauses was, and needs to be taught to check the clause relids in the same way. Hence the attached follow-on patch. There might be a few of these places that don't really need the change because they can never be reached while considering a parameterized join path ... but I don't care to bet that that's true and will stay true. There are a couple of places in analyzejoins.c that were already examining required_relids, but were using bms_equal, which now seems overly strict; this patch makes them use bms_subset like the other places. Thoughts about that? Also, I wondered in commit 306d6e59f whether changing the signature of extract_actual_join_clauses in the back branches was really a good idea. While it's still unpleasant, I'm inclined to leave it that way, because any code that is using that function is almost certainly broken due to this issue and needs to be fixed anyway. I see that the reason postgres_fdw doesn't use that function as of v10 is commit 28b047875, which probably should have been back-patched to 9.6 anyhow. The patch as attached adds a couple of bms_union() steps to calculate join relids in places that didn't have them handy. In both places this could be avoided by passing down the join relids from a caller, but it'd require API changes to globally-visible routines, so I thought eating some cycles would be a safer solution in the back branches. We could clean that up in HEAD though. regards, tom lane diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 30e5726..a46160d 100644 *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *************** foreign_join_ok(PlannerInfo *root, RelOp *** 4705,4711 **** bool is_remote_clause = is_foreign_expr(root, joinrel, rinfo->clause); ! if (IS_OUTER_JOIN(jointype) && !rinfo->is_pushed_down) { if (!is_remote_clause) return false; --- 4705,4712 ---- bool is_remote_clause = is_foreign_expr(root, joinrel, rinfo->clause); ! if (IS_OUTER_JOIN(jointype) && ! !RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids)) { if (!is_remote_clause) return false; diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index 47729de..10d4114 100644 *** a/src/backend/optimizer/path/costsize.c --- b/src/backend/optimizer/path/costsize.c *************** static bool has_indexed_join_quals(NestP *** 159,164 **** --- 159,165 ---- static double approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals); static double calc_joinrel_size_estimate(PlannerInfo *root, + RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel, double outer_rows, *************** compute_semi_anti_join_factors(PlannerIn *** 4055,4066 **** */ if (IS_OUTER_JOIN(jointype)) { joinquals = NIL; foreach(l, restrictlist) { RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! if (!rinfo->is_pushed_down) joinquals = lappend(joinquals, rinfo); } } --- 4056,4069 ---- */ if (IS_OUTER_JOIN(jointype)) { + Relids joinrelids = bms_union(outerrel->relids, innerrel->relids); + joinquals = NIL; foreach(l, restrictlist) { RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! if (!RINFO_IS_PUSHED_DOWN(rinfo, joinrelids)) joinquals = lappend(joinquals, rinfo); } } *************** set_joinrel_size_estimates(PlannerInfo * *** 4375,4380 **** --- 4378,4384 ---- List *restrictlist) { rel->rows = calc_joinrel_size_estimate(root, + rel, outer_rel, inner_rel, outer_rel->rows, *************** get_parameterized_joinrel_size(PlannerIn *** 4417,4422 **** --- 4421,4427 ---- * estimate for any pair with the same parameterization. */ nrows = calc_joinrel_size_estimate(root, + rel, outer_path->parent, inner_path->parent, outer_path->rows, *************** get_parameterized_joinrel_size(PlannerIn *** 4440,4445 **** --- 4445,4451 ---- */ static double calc_joinrel_size_estimate(PlannerInfo *root, + RelOptInfo *joinrel, RelOptInfo *outer_rel, RelOptInfo *inner_rel, double outer_rows, *************** calc_joinrel_size_estimate(PlannerInfo * *** 4492,4498 **** { RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! if (rinfo->is_pushed_down) pushedquals = lappend(pushedquals, rinfo); else joinquals = lappend(joinquals, rinfo); --- 4498,4504 ---- { RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! if (RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids)) pushedquals = lappend(pushedquals, rinfo); else joinquals = lappend(joinquals, rinfo); diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 3fd3cc7..f47dd81 100644 *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *************** hash_inner_and_outer(PlannerInfo *root, *** 1700,1706 **** * If processing an outer join, only use its own join clauses for * hashing. For inner joins we need not be so picky. */ ! if (isouterjoin && restrictinfo->is_pushed_down) continue; if (!restrictinfo->can_join || --- 1700,1706 ---- * If processing an outer join, only use its own join clauses for * hashing. For inner joins we need not be so picky. */ ! if (isouterjoin && RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids)) continue; if (!restrictinfo->can_join || *************** select_mergejoin_clauses(PlannerInfo *ro *** 1947,1953 **** * we don't set have_nonmergeable_joinclause here because pushed-down * clauses will become otherquals not joinquals.) */ ! if (isouterjoin && restrictinfo->is_pushed_down) continue; /* Check that clause is a mergeable operator clause */ --- 1947,1953 ---- * we don't set have_nonmergeable_joinclause here because pushed-down * clauses will become otherquals not joinquals.) */ ! if (isouterjoin && RINFO_IS_PUSHED_DOWN(restrictinfo, joinrel->relids)) continue; /* Check that clause is a mergeable operator clause */ diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 5d66cfb..6d41d30 100644 *** a/src/backend/optimizer/path/joinrels.c --- b/src/backend/optimizer/path/joinrels.c *************** static bool has_join_restriction(Planner *** 35,40 **** --- 35,41 ---- static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel); static bool is_dummy_rel(RelOptInfo *rel); static bool restriction_is_constant_false(List *restrictlist, + RelOptInfo *joinrel, bool only_pushed_down); static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, *************** populate_joinrel_with_paths(PlannerInfo *** 780,786 **** { case JOIN_INNER: if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || ! restriction_is_constant_false(restrictlist, false)) { mark_dummy_rel(joinrel); break; --- 781,787 ---- { case JOIN_INNER: if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || ! restriction_is_constant_false(restrictlist, joinrel, false)) { mark_dummy_rel(joinrel); break; *************** populate_joinrel_with_paths(PlannerInfo *** 794,805 **** break; case JOIN_LEFT: if (is_dummy_rel(rel1) || ! restriction_is_constant_false(restrictlist, true)) { mark_dummy_rel(joinrel); break; } ! if (restriction_is_constant_false(restrictlist, false) && bms_is_subset(rel2->relids, sjinfo->syn_righthand)) mark_dummy_rel(rel2); add_paths_to_joinrel(root, joinrel, rel1, rel2, --- 795,806 ---- break; case JOIN_LEFT: if (is_dummy_rel(rel1) || ! restriction_is_constant_false(restrictlist, joinrel, true)) { mark_dummy_rel(joinrel); break; } ! if (restriction_is_constant_false(restrictlist, joinrel, false) && bms_is_subset(rel2->relids, sjinfo->syn_righthand)) mark_dummy_rel(rel2); add_paths_to_joinrel(root, joinrel, rel1, rel2, *************** populate_joinrel_with_paths(PlannerInfo *** 811,817 **** break; case JOIN_FULL: if ((is_dummy_rel(rel1) && is_dummy_rel(rel2)) || ! restriction_is_constant_false(restrictlist, true)) { mark_dummy_rel(joinrel); break; --- 812,818 ---- break; case JOIN_FULL: if ((is_dummy_rel(rel1) && is_dummy_rel(rel2)) || ! restriction_is_constant_false(restrictlist, joinrel, true)) { mark_dummy_rel(joinrel); break; *************** populate_joinrel_with_paths(PlannerInfo *** 847,853 **** bms_is_subset(sjinfo->min_righthand, rel2->relids)) { if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || ! restriction_is_constant_false(restrictlist, false)) { mark_dummy_rel(joinrel); break; --- 848,854 ---- bms_is_subset(sjinfo->min_righthand, rel2->relids)) { if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || ! restriction_is_constant_false(restrictlist, joinrel, false)) { mark_dummy_rel(joinrel); break; *************** populate_joinrel_with_paths(PlannerInfo *** 870,876 **** sjinfo) != NULL) { if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || ! restriction_is_constant_false(restrictlist, false)) { mark_dummy_rel(joinrel); break; --- 871,877 ---- sjinfo) != NULL) { if (is_dummy_rel(rel1) || is_dummy_rel(rel2) || ! restriction_is_constant_false(restrictlist, joinrel, false)) { mark_dummy_rel(joinrel); break; *************** populate_joinrel_with_paths(PlannerInfo *** 885,896 **** break; case JOIN_ANTI: if (is_dummy_rel(rel1) || ! restriction_is_constant_false(restrictlist, true)) { mark_dummy_rel(joinrel); break; } ! if (restriction_is_constant_false(restrictlist, false) && bms_is_subset(rel2->relids, sjinfo->syn_righthand)) mark_dummy_rel(rel2); add_paths_to_joinrel(root, joinrel, rel1, rel2, --- 886,897 ---- break; case JOIN_ANTI: if (is_dummy_rel(rel1) || ! restriction_is_constant_false(restrictlist, joinrel, true)) { mark_dummy_rel(joinrel); break; } ! if (restriction_is_constant_false(restrictlist, joinrel, false) && bms_is_subset(rel2->relids, sjinfo->syn_righthand)) mark_dummy_rel(rel2); add_paths_to_joinrel(root, joinrel, rel1, rel2, *************** mark_dummy_rel(RelOptInfo *rel) *** 1249,1258 **** * decide there's no match for an outer row, which is pretty stupid. So, * we need to detect the case. * ! * If only_pushed_down is true, then consider only pushed-down quals. */ static bool ! restriction_is_constant_false(List *restrictlist, bool only_pushed_down) { ListCell *lc; --- 1250,1262 ---- * decide there's no match for an outer row, which is pretty stupid. So, * we need to detect the case. * ! * If only_pushed_down is true, then consider only quals that are pushed-down ! * from the point of view of the joinrel. */ static bool ! restriction_is_constant_false(List *restrictlist, ! RelOptInfo *joinrel, ! bool only_pushed_down) { ListCell *lc; *************** restriction_is_constant_false(List *rest *** 1266,1272 **** { RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! if (only_pushed_down && !rinfo->is_pushed_down) continue; if (rinfo->clause && IsA(rinfo->clause, Const)) --- 1270,1276 ---- { RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc); ! if (only_pushed_down && !RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids)) continue; if (rinfo->clause && IsA(rinfo->clause, Const)) *************** try_partitionwise_join(PlannerInfo *root *** 1411,1418 **** * partition keys from given relations being joined. */ bool ! have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2, JoinType jointype, ! List *restrictlist) { PartitionScheme part_scheme = rel1->part_scheme; ListCell *lc; --- 1415,1423 ---- * partition keys from given relations being joined. */ bool ! have_partkey_equi_join(RelOptInfo *joinrel, ! RelOptInfo *rel1, RelOptInfo *rel2, ! JoinType jointype, List *restrictlist) { PartitionScheme part_scheme = rel1->part_scheme; ListCell *lc; *************** have_partkey_equi_join(RelOptInfo *rel1, *** 1438,1444 **** int ipk2; /* If processing an outer join, only use its own join clauses. */ ! if (IS_OUTER_JOIN(jointype) && rinfo->is_pushed_down) continue; /* Skip clauses which can not be used for a join. */ --- 1443,1450 ---- int ipk2; /* If processing an outer join, only use its own join clauses. */ ! if (IS_OUTER_JOIN(jointype) && ! RINFO_IS_PUSHED_DOWN(rinfo, joinrel->relids)) continue; /* Skip clauses which can not be used for a join. */ diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index ef25fef..c5c4362 100644 *** a/src/backend/optimizer/plan/analyzejoins.c --- b/src/backend/optimizer/plan/analyzejoins.c *************** join_is_removable(PlannerInfo *root, Spe *** 253,260 **** * above the outer join, even if it references no other rels (it might * be from WHERE, for example). */ ! if (restrictinfo->is_pushed_down || ! !bms_equal(restrictinfo->required_relids, joinrelids)) { /* * If such a clause actually references the inner rel then join --- 253,259 ---- * above the outer join, even if it references no other rels (it might * be from WHERE, for example). */ ! if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)) { /* * If such a clause actually references the inner rel then join *************** remove_rel_from_query(PlannerInfo *root, *** 422,429 **** remove_join_clause_from_rels(root, rinfo, rinfo->required_relids); ! if (rinfo->is_pushed_down || ! !bms_equal(rinfo->required_relids, joinrelids)) { /* Recheck that qual doesn't actually reference the target rel */ Assert(!bms_is_member(relid, rinfo->clause_relids)); --- 421,427 ---- remove_join_clause_from_rels(root, rinfo, rinfo->required_relids); ! if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids)) { /* Recheck that qual doesn't actually reference the target rel */ Assert(!bms_is_member(relid, rinfo->clause_relids)); *************** is_innerrel_unique_for(PlannerInfo *root *** 1080,1085 **** --- 1078,1084 ---- JoinType jointype, List *restrictlist) { + Relids joinrelids = bms_union(outerrelids, innerrel->relids); List *clause_list = NIL; ListCell *lc; *************** is_innerrel_unique_for(PlannerInfo *root *** 1098,1104 **** * As noted above, if it's a pushed-down clause and we're at an outer * join, we can't use it. */ ! if (restrictinfo->is_pushed_down && IS_OUTER_JOIN(jointype)) continue; /* Ignore if it's not a mergejoinable clause */ --- 1097,1104 ---- * As noted above, if it's a pushed-down clause and we're at an outer * join, we can't use it. */ ! if (IS_OUTER_JOIN(jointype) && ! RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids)) continue; /* Ignore if it's not a mergejoinable clause */ diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index a436b53..01335db 100644 *** a/src/backend/optimizer/plan/initsplan.c --- b/src/backend/optimizer/plan/initsplan.c *************** distribute_qual_to_rels(PlannerInfo *roo *** 1775,1780 **** --- 1775,1785 ---- * attach quals to the lowest level where they can be evaluated. But * if we were ever to re-introduce a mechanism for delaying evaluation * of "expensive" quals, this area would need work. + * + * Note: generally, use of is_pushed_down has to go through the macro + * RINFO_IS_PUSHED_DOWN, because that flag alone is not always sufficient + * to tell whether a clause must be treated as pushed-down in context. + * This seems like another reason why it should perhaps be rethought. *---------- */ if (is_deduced) diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index fc19f89..82b7842 100644 *** a/src/backend/optimizer/util/relnode.c --- b/src/backend/optimizer/util/relnode.c *************** build_joinrel_partition_info(RelOptInfo *** 1629,1635 **** */ if (!IS_PARTITIONED_REL(outer_rel) || !IS_PARTITIONED_REL(inner_rel) || outer_rel->part_scheme != inner_rel->part_scheme || ! !have_partkey_equi_join(outer_rel, inner_rel, jointype, restrictlist)) { Assert(!IS_PARTITIONED_REL(joinrel)); return; --- 1629,1636 ---- */ if (!IS_PARTITIONED_REL(outer_rel) || !IS_PARTITIONED_REL(inner_rel) || outer_rel->part_scheme != inner_rel->part_scheme || ! !have_partkey_equi_join(joinrel, outer_rel, inner_rel, ! jointype, restrictlist)) { Assert(!IS_PARTITIONED_REL(joinrel)); return; diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c index 65c1abc..edf5a48 100644 *** a/src/backend/optimizer/util/restrictinfo.c --- b/src/backend/optimizer/util/restrictinfo.c *************** extract_actual_clauses(List *restrictinf *** 373,379 **** * extract_actual_join_clauses * * Extract bare clauses from 'restrictinfo_list', separating those that ! * syntactically match the join level from those that were pushed down. * Pseudoconstant clauses are excluded from the results. * * This is only used at outer joins, since for plain joins we don't care --- 373,379 ---- * extract_actual_join_clauses * * Extract bare clauses from 'restrictinfo_list', separating those that ! * semantically match the join level from those that were pushed down. * Pseudoconstant clauses are excluded from the results. * * This is only used at outer joins, since for plain joins we don't care *************** extract_actual_join_clauses(List *restri *** 394,408 **** { RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! /* ! * We must check both is_pushed_down and required_relids, since an ! * outer-join clause that's been pushed down to some lower join level ! * via path parameterization will not be marked is_pushed_down; ! * nonetheless, it must be treated as a filter clause not a join ! * clause so far as the lower join level is concerned. ! */ ! if (rinfo->is_pushed_down || ! !bms_is_subset(rinfo->required_relids, joinrelids)) { if (!rinfo->pseudoconstant) *otherquals = lappend(*otherquals, rinfo->clause); --- 394,400 ---- { RestrictInfo *rinfo = lfirst_node(RestrictInfo, l); ! if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids)) { if (!rinfo->pseudoconstant) *otherquals = lappend(*otherquals, rinfo->clause); diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index d6dbf15..2108b6a 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** typedef struct LimitPath *** 1789,1795 **** * if we decide that it can be pushed down into the nullable side of the join. * In that case it acts as a plain filter qual for wherever it gets evaluated. * (In short, is_pushed_down is only false for non-degenerate outer join ! * conditions. Possibly we should rename it to reflect that meaning?) * * RestrictInfo nodes also contain an outerjoin_delayed flag, which is true * if the clause's applicability must be delayed due to any outer joins --- 1789,1796 ---- * if we decide that it can be pushed down into the nullable side of the join. * In that case it acts as a plain filter qual for wherever it gets evaluated. * (In short, is_pushed_down is only false for non-degenerate outer join ! * conditions. Possibly we should rename it to reflect that meaning? But ! * see also the comments for RINFO_IS_PUSHED_DOWN, below.) * * RestrictInfo nodes also contain an outerjoin_delayed flag, which is true * if the clause's applicability must be delayed due to any outer joins *************** typedef struct RestrictInfo *** 1932,1937 **** --- 1933,1952 ---- } RestrictInfo; /* + * This macro embodies the correct way to test whether a RestrictInfo is + * "pushed down" to a given outer join, that is, should be treated as a filter + * clause rather than a join clause at that outer join. This is certainly so + * if is_pushed_down is true; but examining that is not sufficient anymore, + * because outer-join clauses will get pushed down to lower outer joins when + * we generate a path for the lower outer join that is parameterized by the + * LHS of the upper one. We can detect such a clause by noting that its + * required_relids exceed the scope of the join. + */ + #define RINFO_IS_PUSHED_DOWN(rinfo, joinrelids) \ + ((rinfo)->is_pushed_down || \ + !bms_is_subset((rinfo)->required_relids, joinrelids)) + + /* * Since mergejoinscansel() is a relatively expensive function, and would * otherwise be invoked many times while planning a large join tree, * we go out of our way to cache its results. Each mergejoinable diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 50e180c..f181586 100644 *** a/src/include/optimizer/paths.h --- b/src/include/optimizer/paths.h *************** extern bool have_join_order_restriction( *** 115,121 **** extern bool have_dangerous_phv(PlannerInfo *root, Relids outer_relids, Relids inner_params); extern void mark_dummy_rel(RelOptInfo *rel); ! extern bool have_partkey_equi_join(RelOptInfo *rel1, RelOptInfo *rel2, JoinType jointype, List *restrictlist); /* --- 115,122 ---- extern bool have_dangerous_phv(PlannerInfo *root, Relids outer_relids, Relids inner_params); extern void mark_dummy_rel(RelOptInfo *rel); ! extern bool have_partkey_equi_join(RelOptInfo *joinrel, ! RelOptInfo *rel1, RelOptInfo *rel2, JoinType jointype, List *restrictlist); /*
pgsql-bugs by date: