Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware) - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)
Date
Msg-id 999171.1677611407@sss.pgh.pa.us
Whole thread Raw
In response to Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)  (Richard Guo <guofenglinux@gmail.com>)
Re: Clause accidentally pushed down ( Possible bug in Making Vars outer-join aware)  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-bugs
I wrote:
> I have a WIP patch that does it like this, and it fixes the presented
> case; but it's not complete so it's still failing some existing
> regression tests.  More later.

Here's said patch.  Although this fixes the described problem and
passes check-world, I'm not totally happy with it yet: it feels
like the new add_outer_joins_to_relids() function is too expensive
to be doing every time we construct a join relation.  I wonder if
there is a way we can add more info to the SpecialJoinInfo data
structure to make it cheaper.  An obvious improvement is to store
commute_below_l explicitly instead of recomputing it over and over,
but that isn't going to move the needle all that far.  Is there a
way to not have to scan all the SpecialJoinInfos?

I may be worrying over nothing --- it's likely that the path
construction work that will happen after we make the join relation
RelOptInfo will swamp this cost anyway.  But it feels ugly.

Another thing I'm wondering about is whether this'd let us get
rid of RestrictInfo.is_pushed_down and RINFO_IS_PUSHED_DOWN.
I tried really hard to remove those in favor of checking to see
whether a qual clause's required_relids include the outer join
being formed, which seems like a much more principled way of
deciding whether it's a filter or join clause.  I couldn't get
that to work, but now I wonder if what I was running into was
really bugs associated with not understanding that we haven't
fully formed the lower outer join when we apply identity 3.

            regards, tom lane

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 227278eb6c..d017f3555a 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -410,6 +410,8 @@ joins.  However, when building Vars representing the outputs of join
 relations, we need to ensure that their varnullingrels are set to
 values consistent with the syntactic join order, so that they will
 appear equal() to pre-existing Vars in the upper part of the query.
+(We also have to be careful about where we place qual clauses using
+such Vars, as described below.)

 Outer joins also complicate handling of subquery pull-up.  Consider

@@ -507,6 +509,25 @@ problem for join relation identification either, since whether a semijoin
 has been completed is again implicit in the set of base relations
 included in the join.

+As usual, outer join identity 3 complicates matters.  If we start with
+    (A leftjoin B on (Pab)) leftjoin C on (Pbc)
+then the parser will have marked any C Vars appearing above these joins
+with the RT index of the B/C join.  If we now transform to
+    A leftjoin (B leftjoin C on (Pbc)) on (Pab)
+then it would appear that a clause using only such Vars could be pushed
+down and applied as a filter clause (not a join clause) at the lower
+B/C join.  But *this might not give the right answer* since the clause
+might see a non-null value for the C Var that will be replaced by null
+once the A/B join is performed.  We handle this by saying that the
+pushed-down join hasn't completely performed the work of the B/C join
+and hence is not entitled to include that outer join relid in its
+relid set.  When we form the A/B join, both outer joins' relids will
+be added to its relid set, and then the upper clause will be applied
+at the correct join level.  (Note there is no problem when identity 3
+is applied in the other direction: if we started with the second form
+then upper C Vars are marked with both outer join relids, so they
+cannot drop below whichever join is applied second.)
+
 There is one additional complication for qual clause placement, which
 occurs when we have made multiple versions of an outer-join clause as
 described previously (that is, we have both "Pbc" and "Pb*c" forms of
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 9949d5b1d3..ecb1343d1a 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1366,19 +1366,20 @@ generate_base_implied_equalities_broken(PlannerInfo *root,
  * commutative duplicates, i.e. if the algorithm selects "a.x = b.y" but
  * we already have "b.y = a.x", we return the existing clause.
  *
- * If we are considering an outer join, ojrelid is the associated OJ relid,
- * otherwise it's zero.
+ * If we are considering an outer join, sjinfo is the associated OJ info,
+ * otherwise it can be NULL.
  *
  * join_relids should always equal bms_union(outer_relids, inner_rel->relids)
- * plus ojrelid if that's not zero.  We could simplify this function's API by
- * computing it internally, but most callers have the value at hand anyway.
+ * plus whatever add_outer_joins_to_relids() would add.  We could simplify
+ * this function's API by computing it internally, but most callers have the
+ * value at hand anyway.
  */
 List *
 generate_join_implied_equalities(PlannerInfo *root,
                                  Relids join_relids,
                                  Relids outer_relids,
                                  RelOptInfo *inner_rel,
-                                 Index ojrelid)
+                                 SpecialJoinInfo *sjinfo)
 {
     List       *result = NIL;
     Relids        inner_relids = inner_rel->relids;
@@ -1396,8 +1397,9 @@ generate_join_implied_equalities(PlannerInfo *root,
         nominal_inner_relids = inner_rel->top_parent_relids;
         /* ECs will be marked with the parent's relid, not the child's */
         nominal_join_relids = bms_union(outer_relids, nominal_inner_relids);
-        if (ojrelid != 0)
-            nominal_join_relids = bms_add_member(nominal_join_relids, ojrelid);
+        nominal_join_relids = add_outer_joins_to_relids(root,
+                                                        nominal_join_relids,
+                                                        sjinfo);
     }
     else
     {
@@ -1418,7 +1420,7 @@ generate_join_implied_equalities(PlannerInfo *root,
      * At inner joins, we can be smarter: only consider eclasses mentioning
      * both input rels.
      */
-    if (ojrelid != 0)
+    if (sjinfo && sjinfo->ojrelid != 0)
         matching_ecs = get_eclass_indexes_for_relids(root, nominal_join_relids);
     else
         matching_ecs = get_common_eclass_indexes(root, nominal_inner_relids,
@@ -1467,7 +1469,7 @@ generate_join_implied_equalities(PlannerInfo *root,
  * generate_join_implied_equalities_for_ecs
  *      As above, but consider only the listed ECs.
  *
- * For the sole current caller, we can assume ojrelid == 0, that is we are
+ * For the sole current caller, we can assume sjinfo == NULL, that is we are
  * not interested in outer-join filter clauses.  This might need to change
  * in future.
  */
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 011a0337da..64dfddea2a 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3367,7 +3367,7 @@ check_index_predicates(PlannerInfo *root, RelOptInfo *rel)
                                                                    otherrels),
                                                          otherrels,
                                                          rel,
-                                                         0));
+                                                         NULL));

     /*
      * Normally we remove quals that are implied by a partial index's
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index d7cb11c851..be7865cc0a 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -710,9 +710,8 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
         return NULL;
     }

-    /* If we have an outer join, add its RTI to form the canonical relids. */
-    if (sjinfo && sjinfo->ojrelid != 0)
-        joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid);
+    /* Add outer join relid(s) to form the canonical relids. */
+    joinrelids = add_outer_joins_to_relids(root, joinrelids, sjinfo);

     /* Swap rels if needed to match the join info. */
     if (reversed)
@@ -775,6 +774,102 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
     return joinrel;
 }

+/*
+ * XXX hack, if we need this we should probably be retaining commute_below_l
+ * as a field...
+ */
+static bool
+commute_below_l_is_subset(SpecialJoinInfo *sjinfo, Relids relids)
+{
+    if (sjinfo->commute_below == NULL)    /* easy case */
+        return true;
+    else
+    {
+        Relids        commute_below_l = bms_intersect(sjinfo->commute_below,
+                                                    sjinfo->syn_lefthand);
+        bool        result;
+
+        result = bms_is_subset(commute_below_l, relids);
+        bms_free(commute_below_l);
+        return result;
+    }
+}
+
+/*
+ * add_outer_joins_to_relids
+ *      Add relids to input_relids to represent any outer joins that will be
+ *      calculated at this join.
+ *
+ * input_relids is the union of the relid sets of the two input relations.
+ * Note that we modify this in-place and return it; caller must bms_copy()
+ * it first, if a separate value is desired.
+ *
+ * sjinfo represents the join being performed.
+ */
+Relids
+add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
+                          SpecialJoinInfo *sjinfo)
+{
+    /* Nothing to do if this isn't an outer join with an assigned relid. */
+    if (sjinfo == NULL || sjinfo->ojrelid == 0)
+        return input_relids;
+
+    /*
+     * If it's not a left join, we have no rules that would permit executing
+     * it in non-syntactic order, so just form the syntactic relid set.  (This
+     * is just a quick-exit test; we'd come to the same conclusion anyway,
+     * since its commute_below and commute_above_l sets must be empty.)
+     */
+    if (sjinfo->jointype != JOIN_LEFT)
+        return bms_add_member(input_relids, sjinfo->ojrelid);
+
+    /*
+     * Add the OJ relid unless this join has been pushed into the RHS of a
+     * syntactically-lower left join per OJ identity 3.  (If it has, then we
+     * cannot claim that its outputs represent the final state of its RHS.)
+     */
+    if (commute_below_l_is_subset(sjinfo, input_relids))
+        input_relids = bms_add_member(input_relids, sjinfo->ojrelid);
+
+    /*
+     * Contrariwise, if we are now forming the final result of such a commuted
+     * pair of OJs, it's time to add the relid(s) of the pushed-down join(s).
+     * We can skip this if this join was never a candidate to be pushed up.
+     */
+    if (sjinfo->commute_above_l)
+    {
+        ListCell   *lc;
+
+        /*
+         * The current join could complete the nulling of more than one
+         * pushed-down join, so we have to examine all the SpecialJoinInfos.
+         * Because join_info_list was built in bottom-up order, it's
+         * sufficient to traverse it once: an ojrelid we add in one loop
+         * iteration would not have affected decisions of earlier iterations.
+         */
+        foreach(lc, root->join_info_list)
+        {
+            SpecialJoinInfo *othersj = (SpecialJoinInfo *) lfirst(lc);
+
+            if (othersj == sjinfo ||
+                othersj->ojrelid == 0 || othersj->jointype != JOIN_LEFT)
+                continue;        /* definitely not interesting */
+
+            if (othersj->commute_below == NULL)
+                continue;        /* was never a candidate to be pushed down */
+
+            /* Add it if not already present but conditions now satisfied */
+            if (!bms_is_member(othersj->ojrelid, input_relids) &&
+                bms_is_subset(othersj->min_lefthand, input_relids) &&
+                bms_is_subset(othersj->min_righthand, input_relids) &&
+                commute_below_l_is_subset(othersj, input_relids))
+                input_relids = bms_add_member(input_relids, othersj->ojrelid);
+        }
+    }
+
+    return input_relids;
+}
+
 /*
  * populate_joinrel_with_paths
  *      Add paths to the given joinrel for given pair of joining relations. The
@@ -1531,9 +1626,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,

         /* Build correct join relids for child join */
         child_joinrelids = bms_union(child_rel1->relids, child_rel2->relids);
-        if (child_sjinfo->ojrelid != 0)
-            child_joinrelids = bms_add_member(child_joinrelids,
-                                              child_sjinfo->ojrelid);
+        child_joinrelids = add_outer_joins_to_relids(root, child_joinrelids,
+                                                     child_sjinfo);

         /* Find the AppendRelInfo structures */
         appinfos = find_appinfos_by_relids(root, child_joinrelids, &nappinfos);
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 98cf3494e6..e4bb0847ce 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -88,8 +88,11 @@ restart:
          */
         innerrelid = bms_singleton_member(sjinfo->min_righthand);

-        /* Compute the relid set for the join we are considering */
-        joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+        /*
+         * Compute the relid set for the join we are considering.  We can
+         * assume things are done in syntactic order.
+         */
+        joinrelids = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand);
         if (sjinfo->ojrelid != 0)
             joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid);

@@ -201,8 +204,8 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
     if (!rel_supports_distinctness(root, innerrel))
         return false;

-    /* Compute the relid set for the join we are considering */
-    inputrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
+    /* Compute the syntactic relid set for the join we are considering */
+    inputrelids = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand);
     Assert(sjinfo->ojrelid != 0);
     joinrelids = bms_copy(inputrelids);
     joinrelids = bms_add_member(joinrelids, sjinfo->ojrelid);
@@ -670,7 +673,7 @@ reduce_unique_semijoins(PlannerInfo *root)
                                                          joinrelids,
                                                          sjinfo->min_lefthand,
                                                          innerrel,
-                                                         0),
+                                                         NULL),
                         innerrel->joininfo);

         /* Test whether the innerrel is unique for those clauses. */
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9ad44a0508..13e568b414 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -872,8 +872,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,

     joinrel->reloptkind = RELOPT_OTHER_JOINREL;
     joinrel->relids = bms_union(outer_rel->relids, inner_rel->relids);
-    if (sjinfo->ojrelid != 0)
-        joinrel->relids = bms_add_member(joinrel->relids, sjinfo->ojrelid);
+    joinrel->relids = add_outer_joins_to_relids(root, joinrel->relids, sjinfo);
     joinrel->rows = 0;
     /* cheap startup cost is interesting iff not all tuples to be retrieved */
     joinrel->consider_startup = (root->tuple_fraction > 0);
@@ -1266,7 +1265,7 @@ build_joinrel_restrictlist(PlannerInfo *root,
                                                           joinrel->relids,
                                                           outer_rel->relids,
                                                           inner_rel,
-                                                          sjinfo->ojrelid));
+                                                          sjinfo));

     return result;
 }
@@ -1550,7 +1549,7 @@ get_baserel_parampathinfo(PlannerInfo *root, RelOptInfo *baserel,
                                                             joinrelids,
                                                             required_outer,
                                                             baserel,
-                                                            0));
+                                                            NULL));

     /* Compute set of serial numbers of the enforced clauses */
     pserials = NULL;
@@ -1673,7 +1672,7 @@ get_joinrel_parampathinfo(PlannerInfo *root, RelOptInfo *joinrel,
                                                 join_and_req,
                                                 required_outer,
                                                 joinrel,
-                                                0);
+                                                NULL);
     /* We only want ones that aren't movable to lower levels */
     dropped_ecs = NIL;
     foreach(lc, eclauses)
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 5b9db7733d..d9e1623274 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -104,6 +104,8 @@ extern void add_paths_to_joinrel(PlannerInfo *root, RelOptInfo *joinrel,
 extern void join_search_one_level(PlannerInfo *root, int level);
 extern RelOptInfo *make_join_rel(PlannerInfo *root,
                                  RelOptInfo *rel1, RelOptInfo *rel2);
+extern Relids add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
+                                        SpecialJoinInfo *sjinfo);
 extern bool have_join_order_restriction(PlannerInfo *root,
                                         RelOptInfo *rel1, RelOptInfo *rel2);
 extern bool have_dangerous_phv(PlannerInfo *root,
@@ -150,7 +152,7 @@ extern List *generate_join_implied_equalities(PlannerInfo *root,
                                               Relids join_relids,
                                               Relids outer_relids,
                                               RelOptInfo *inner_rel,
-                                              Index ojrelid);
+                                              SpecialJoinInfo *sjinfo);
 extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
                                                       List *eclasses,
                                                       Relids join_relids,
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index e293de03c0..73edfc4bd0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2357,6 +2357,40 @@ where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous;
 ----+----+----------+----------
 (0 rows)

+--
+-- check for correct placement of non-strict qual in a multiway outer join
+--
+explain (costs off)
+select t1.f1
+from int4_tbl t1, int4_tbl t2
+  left join int4_tbl t3 on t3.f1 > 0
+  left join int4_tbl t4 on t3.f1 > 1
+where t4.f1 is null;
+                      QUERY PLAN
+-------------------------------------------------------
+ Nested Loop
+   ->  Nested Loop Left Join
+         Filter: (t4.f1 IS NULL)
+         ->  Seq Scan on int4_tbl t2
+         ->  Materialize
+               ->  Nested Loop Left Join
+                     Join Filter: (t3.f1 > 1)
+                     ->  Seq Scan on int4_tbl t3
+                           Filter: (f1 > 0)
+                     ->  Materialize
+                           ->  Seq Scan on int4_tbl t4
+   ->  Seq Scan on int4_tbl t1
+(12 rows)
+
+select t1.f1
+from int4_tbl t1, int4_tbl t2
+  left join int4_tbl t3 on t3.f1 > 0
+  left join int4_tbl t4 on t3.f1 > 1
+where t4.f1 is null;
+ f1
+----
+(0 rows)
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 5c0328ed76..bbb1b7d68f 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -441,6 +441,22 @@ select a.f1, b.f1, t.thousand, t.tenthous from
   (select sum(f1) as f1 from int4_tbl i4b) b
 where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous;

+--
+-- check for correct placement of non-strict qual in a multiway outer join
+--
+explain (costs off)
+select t1.f1
+from int4_tbl t1, int4_tbl t2
+  left join int4_tbl t3 on t3.f1 > 0
+  left join int4_tbl t4 on t3.f1 > 1
+where t4.f1 is null;
+
+select t1.f1
+from int4_tbl t1, int4_tbl t2
+  left join int4_tbl t3 on t3.f1 > 0
+  left join int4_tbl t4 on t3.f1 > 1
+where t4.f1 is null;
+
 --
 -- check a case where we formerly got confused by conflicting sort orders
 -- in redundant merge join path keys

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17814: Detail info on kerbos authentication
Next
From: Mats Kindahl
Date:
Subject: Re: Crash during backend start when low on memory