Re: A problem about partitionwise join - Mailing list pgsql-hackers

From Tom Lane
Subject Re: A problem about partitionwise join
Date
Msg-id 502.1586032678@sss.pgh.pa.us
Whole thread Raw
In response to Re: A problem about partitionwise join  (Richard Guo <riguo@pivotal.io>)
Responses Re: A problem about partitionwise join  (Richard Guo <guofenglinux@gmail.com>)
List pgsql-hackers
Richard Guo <riguo@pivotal.io> writes:
> Rebased the patch with latest master and also addressed the test case
> failure reported by PostgreSQL Patch Tester.

I looked this patch over, but I don't like it too much: it seems very
brute-force (and badly under-commented).  Creating all those extra
RestrictInfos isn't too cheap in itself, plus they'll jam up the
equivalence-class machinery for future tests.

There is already something in equivclass.c that would almost do what
we want here: exprs_known_equal() would tell us whether the partkeys
can be found in the same eclass, without having to generate data
structures along the way.  The current implementation is not watertight
because it doesn't check opclass semantics, but that consideration
can be bolted on readily enough.  So that leads me to something like
the attached.

One argument that could be made against this approach is that if there
are a lot of partkey expressions, this requires O(N^2) calls to
exprs_known_equal, something that's already not too cheap.  I think
that that's not a big problem because the number of partkey expressions
would only be equal to the join degree (ie it doesn't scale with the
number of partitions of the baserels) ... but maybe I'm wrong about
that?  I also wonder if it's really necessary to check every pair
of partkey expressions.  It seems at least plausible that in the
cases we care about, all the partkeys on each side would be in the same
eclasses anyway, so that comparing the first members of each list would
be sufficient.  But I haven't beat on that point.

            regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4ef1254..7c21692 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2074,15 +2074,17 @@ reconsider_full_join_clause(PlannerInfo *root, RestrictInfo *rinfo)
  *      Detect whether two expressions are known equal due to equivalence
  *      relationships.
  *
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with).  If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
  *
  * Note: does not bother to check for "equal(item1, item2)"; caller must
  * check that case if it's possible to pass identical items.
  */
 bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
 {
     ListCell   *lc1;

@@ -2097,6 +2099,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
         if (ec->ec_has_volatile)
             continue;

+        /*
+         * It's okay to consider ec_broken ECs here.  Brokenness just means we
+         * couldn't derive all the implied clauses we'd have liked to; it does
+         * not invalidate our knowledge that the members are equal.
+         */
+
+        /* Ignore if this EC doesn't use specified opfamily */
+        if (OidIsValid(opfamily) &&
+            !list_member_oid(ec->ec_opfamilies, opfamily))
+            continue;
+
         foreach(lc2, ec->ec_members)
         {
             EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2125,8 +2138,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
  * (In principle there might be more than one matching eclass if multiple
  * collations are involved, but since collation doesn't matter for equality,
  * we ignore that fine point here.)  This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
  */
 EquivalenceClass *
 match_eclasses_to_foreign_key_col(PlannerInfo *root,
@@ -2163,7 +2175,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
         /* Never match to a volatile EC */
         if (ec->ec_has_volatile)
             continue;
-        /* Note: it seems okay to match to "broken" eclasses here */
+        /* It's okay to consider "broken" ECs here, see exprs_known_equal */

         foreach(lc2, ec->ec_members)
         {
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index af1fb48..18474d8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -56,10 +56,10 @@ static List *subbuild_joinrel_joinlist(RelOptInfo *joinrel,
 static void set_foreign_rel_properties(RelOptInfo *joinrel,
                                        RelOptInfo *outer_rel, RelOptInfo *inner_rel);
 static void add_join_rel(PlannerInfo *root, RelOptInfo *joinrel);
-static void build_joinrel_partition_info(RelOptInfo *joinrel,
+static void build_joinrel_partition_info(PlannerInfo *root, RelOptInfo *joinrel,
                                          RelOptInfo *outer_rel, RelOptInfo *inner_rel,
                                          List *restrictlist, JoinType jointype);
-static bool have_partkey_equi_join(RelOptInfo *joinrel,
+static bool have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
                                    RelOptInfo *rel1, RelOptInfo *rel2,
                                    JoinType jointype, List *restrictlist);
 static int    match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
@@ -715,8 +715,8 @@ build_join_rel(PlannerInfo *root,
     joinrel->has_eclass_joins = has_relevant_eclass_joinclause(root, joinrel);

     /* Store the partition information. */
-    build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
-                                 sjinfo->jointype);
+    build_joinrel_partition_info(root, joinrel, outer_rel, inner_rel,
+                                 restrictlist, sjinfo->jointype);

     /*
      * Set estimates of the joinrel's size.
@@ -879,8 +879,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
     joinrel->has_eclass_joins = parent_joinrel->has_eclass_joins;

     /* Is the join between partitions itself partitioned? */
-    build_joinrel_partition_info(joinrel, outer_rel, inner_rel, restrictlist,
-                                 jointype);
+    build_joinrel_partition_info(root, joinrel, outer_rel, inner_rel,
+                                 restrictlist, jointype);

     /* Child joinrel is parallel safe if parent is parallel safe. */
     joinrel->consider_parallel = parent_joinrel->consider_parallel;
@@ -1621,9 +1621,9 @@ find_param_path_info(RelOptInfo *rel, Relids required_outer)
  *        partitioned join relation.
  */
 static void
-build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
-                             RelOptInfo *inner_rel, List *restrictlist,
-                             JoinType jointype)
+build_joinrel_partition_info(PlannerInfo *root, RelOptInfo *joinrel,
+                             RelOptInfo *outer_rel, RelOptInfo *inner_rel,
+                             List *restrictlist, JoinType jointype)
 {
     PartitionScheme part_scheme;

@@ -1649,7 +1649,7 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
         !outer_rel->consider_partitionwise_join ||
         !inner_rel->consider_partitionwise_join ||
         outer_rel->part_scheme != inner_rel->part_scheme ||
-        !have_partkey_equi_join(joinrel, outer_rel, inner_rel,
+        !have_partkey_equi_join(root, joinrel, outer_rel, inner_rel,
                                 jointype, restrictlist))
     {
         Assert(!IS_PARTITIONED_REL(joinrel));
@@ -1713,15 +1713,14 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
  * partition keys.
  */
 static bool
-have_partkey_equi_join(RelOptInfo *joinrel,
+have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
                        RelOptInfo *rel1, RelOptInfo *rel2,
                        JoinType jointype, List *restrictlist)
 {
     PartitionScheme part_scheme = rel1->part_scheme;
-    ListCell   *lc;
-    int            cnt_pks;
     bool        pk_has_clause[PARTITION_MAX_KEYS];
-    bool        strict_op;
+    int            pks_known_equal;
+    ListCell   *lc;

     /*
      * This function must only be called when the joined relations have same
@@ -1730,13 +1729,19 @@ have_partkey_equi_join(RelOptInfo *joinrel,
     Assert(rel1->part_scheme == rel2->part_scheme);
     Assert(part_scheme);

+    /* We use a bool array to track which partkey columns are known equal */
     memset(pk_has_clause, 0, sizeof(pk_has_clause));
+    /* ... as well as a count of how many are known equal */
+    pks_known_equal = 0;
+
+    /* First, look through the join's restriction clauses */
     foreach(lc, restrictlist)
     {
         RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
         OpExpr       *opexpr;
         Expr       *expr1;
         Expr       *expr2;
+        bool        strict_op;
         int            ipk1;
         int            ipk2;

@@ -1796,11 +1801,15 @@ have_partkey_equi_join(RelOptInfo *joinrel,
         if (ipk1 != ipk2)
             continue;

+        /* Ignore clause if we already proved these keys equal. */
+        if (pk_has_clause[ipk1])
+            continue;
+
         /*
          * The clause allows partitionwise join only if it uses the same
          * operator family as that specified by the partition key.
          */
-        if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
+        if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
         {
             if (!OidIsValid(rinfo->hashjoinoperator) ||
                 !op_in_opfamily(rinfo->hashjoinoperator,
@@ -1813,16 +1822,88 @@ have_partkey_equi_join(RelOptInfo *joinrel,

         /* Mark the partition key as having an equi-join clause. */
         pk_has_clause[ipk1] = true;
+
+        /* We can stop examining clauses once we prove all keys equal. */
+        if (++pks_known_equal == part_scheme->partnatts)
+            return true;
     }

-    /* Check whether every partition key has an equi-join condition. */
-    for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
+    /*
+     * Also check to see if any keys are known equal by equivclass.c.  In most
+     * cases there would have been a join restriction clause generated from
+     * any EC that had such knowledge, but there might be no such clause, or
+     * it might happen to constrain other members of the ECs than the ones we
+     * are looking for.
+     */
+    for (int ipk = 0; ipk < part_scheme->partnatts; ipk++)
     {
-        if (!pk_has_clause[cnt_pks])
-            return false;
+        Oid            btree_opfamily;
+
+        /* Ignore if we already proved these keys equal. */
+        if (pk_has_clause[ipk])
+            continue;
+
+        /*
+         * We need a btree opfamily to ask equivclass.c about.  If the
+         * partopfamily is a hash opfamily, look up its equality operator, and
+         * select some btree opfamily that that operator is part of.  (Any
+         * such opfamily should be good enough, since equivclass.c will track
+         * multiple opfamilies as appropriate.)
+         */
+        if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
+        {
+            Oid            eq_op;
+            List       *eq_opfamilies;
+
+            eq_op = get_opfamily_member(part_scheme->partopfamily[ipk],
+                                        part_scheme->partopcintype[ipk],
+                                        part_scheme->partopcintype[ipk],
+                                        HTEqualStrategyNumber);
+            if (!OidIsValid(eq_op))
+                break;            /* we're not going to succeed */
+            eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+            if (eq_opfamilies == NIL)
+                break;            /* we're not going to succeed */
+            btree_opfamily = linitial_oid(eq_opfamilies);
+        }
+        else
+            btree_opfamily = part_scheme->partopfamily[ipk];
+
+        /*
+         * We consider only non-nullable partition keys here; nullable ones
+         * would not be treated as part of the same equivalence classes as
+         * non-nullable ones.
+         */
+        foreach(lc, rel1->partexprs[ipk])
+        {
+            Node       *expr1 = (Node *) lfirst(lc);
+            ListCell   *lc2;
+
+            foreach(lc2, rel2->partexprs[ipk])
+            {
+                Node       *expr2 = (Node *) lfirst(lc2);
+
+                if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
+                {
+                    pk_has_clause[ipk] = true;
+                    break;
+                }
+            }
+            if (pk_has_clause[ipk])
+                break;
+        }
+
+        if (pk_has_clause[ipk])
+        {
+            /* We can stop examining keys once we prove all keys equal. */
+            if (++pks_known_equal == part_scheme->partnatts)
+                return true;
+        }
+        else
+            break;                /* no chance to succeed, give up */
     }

-    return true;
+    return false;
 }

 /*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4fdcb07..5878a14 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3109,10 +3109,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,

         /*
          * Drop known-equal vars, but only if they belong to different
-         * relations (see comments for estimate_num_groups)
+         * relations (see comments for estimate_num_groups).  We aren't too
+         * fussy about the semantics of "equal" here.
          */
         if (vardata->rel != varinfo->rel &&
-            exprs_known_equal(root, var, varinfo->var))
+            exprs_known_equal(root, var, varinfo->var, InvalidOid))
         {
             if (varinfo->ndistinct <= ndistinct)
             {
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index c689fe8..3756a44 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -142,7 +142,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
                                                       Relids join_relids,
                                                       Relids outer_relids,
                                                       RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+                              Oid opfamily);
 extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
                                                            ForeignKeyOptInfo *fkinfo,
                                                            int colno);
diff --git a/src/test/regress/expected/partition_join.out b/src/test/regress/expected/partition_join.out
index b3fbe47..9e8eeaf 100644
--- a/src/test/regress/expected/partition_join.out
+++ b/src/test/regress/expected/partition_join.out
@@ -62,6 +62,45 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
  450 | 0450 | 450 | 0450
 (4 rows)

+-- inner join with partially-redundant join clauses
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
+                          QUERY PLAN
+---------------------------------------------------------------
+ Sort
+   Sort Key: t1.a
+   ->  Append
+         ->  Merge Join
+               Merge Cond: (t1_1.a = t2_1.a)
+               ->  Index Scan using iprt1_p1_a on prt1_p1 t1_1
+               ->  Sort
+                     Sort Key: t2_1.b
+                     ->  Seq Scan on prt2_p1 t2_1
+                           Filter: (a = b)
+         ->  Hash Join
+               Hash Cond: (t1_2.a = t2_2.a)
+               ->  Seq Scan on prt1_p2 t1_2
+               ->  Hash
+                     ->  Seq Scan on prt2_p2 t2_2
+                           Filter: (a = b)
+         ->  Hash Join
+               Hash Cond: (t1_3.a = t2_3.a)
+               ->  Seq Scan on prt1_p3 t1_3
+               ->  Hash
+                     ->  Seq Scan on prt2_p3 t2_3
+                           Filter: (a = b)
+(22 rows)
+
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
+ a  |  c   | b  |  c
+----+------+----+------
+  0 | 0000 |  0 | 0000
+  6 | 0006 |  6 | 0006
+ 12 | 0012 | 12 | 0012
+ 18 | 0018 | 18 | 0018
+ 24 | 0024 | 24 | 0024
+(5 rows)
+
 -- left outer join, with whole-row reference; partitionwise join does not apply
 EXPLAIN (COSTS OFF)
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;
diff --git a/src/test/regress/sql/partition_join.sql b/src/test/regress/sql/partition_join.sql
index 575ba7b..1f218c0 100644
--- a/src/test/regress/sql/partition_join.sql
+++ b/src/test/regress/sql/partition_join.sql
@@ -34,6 +34,11 @@ EXPLAIN (COSTS OFF)
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
 SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;

+-- inner join with partially-redundant join clauses
+EXPLAIN (COSTS OFF)
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
+SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
+
 -- left outer join, with whole-row reference; partitionwise join does not apply
 EXPLAIN (COSTS OFF)
 SELECT t1, t2 FROM prt1 t1 LEFT JOIN prt2 t2 ON t1.a = t2.b WHERE t1.b = 0 ORDER BY t1.a, t2.b;

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: idea: reduce logical slot induced bloat when multiple databases areused
Next
From: Jaime Casanova
Date:
Subject: Failed Assertion about PolymorphicType