Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CA+TgmoZyns3FYX5CJVtZWXXn_nLkaV1NhUNG9DvhCVwKQN81jg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Tue, Mar 14, 2017 at 8:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Of course, that supposes that 0009 can manage to postpone creating
> non-sampled child joinrels until create_partition_join_plan(), which
> it currently doesn't.  In fact, unless I'm missing something, 0009
> hasn't been even slightly adapted to take advantage of the
> infrastructure in 0001; it doesn't seem to reset the path_cxt or
> anything.  That seems like a fairly major omission.

Some other comments on 0009:

Documentation changes for the new GUCs are missing.

+between the partition keys of the joining tables. The equi-join between
+partition keys implies that for a given row in a given partition of a given
+partitioned table, its joining row, if exists, should exist only in the
+matching partition of the other partitioned table; no row from non-matching

There could be more than one.   I'd write: The equi-join between
partition keys implies that all join partners for a given row in one
partitioned table must be in the corresponding partition of the other
partitioned table.

+#include "miscadmin.h"#include <limits.h>#include <math.h>

Added in wrong place.

+                                * System attributes do not need
translation. In such a case,
+                                * the attribute numbers of the parent
and the child should
+                                * start from the same minimum attribute.

I would delete the second sentence and add an Assert() to that effect instead.

+               /* Pass top parent's relids down the inheritance hierarchy. */

Why?

+                       for (attno = rel->min_attr; attno <=
rel->max_attr; attno++)

Add add a comment explaining why we need to do this.

-       add_paths_to_append_rel(root, rel, live_childrels);
+       add_paths_to_append_rel(root, rel, live_childrels, false);}

-

No need to remove blank line.

+ * When called on partitioned join relation with partition_join_path = true, it
+ * adds PartitionJoinPath instead of Merge/Append path. This path is costed
+ * based on the costs of sampled child-join and is expanded later into
+ * Merge/Append plan.

I'm not a big fan of the Merge/Append terminology here.  If somebody
adds another kind of append-path someday, then all of these comments
will have to be updated.  I think this can be phrased more
generically.
       /*
+        * While creating PartitionJoinPath, we sample paths from only
a few child
+        * relations. Even if all of sampled children have partial
paths, it's not
+        * guaranteed that all the unsampled children will have partial paths.
+        * Hence we do not create partial PartitionJoinPaths.
+        */

Very sad.  I guess if we had parallel append available, we could maybe
dodge this problem, but for now I suppose we're stuck with it.

+       /*
+        * Partitioning scheme in join relation indicates a possibility that the
+        * join may be partitioned, but it's not necessary that every pair of
+        * joining relations can use partition-wise join technique. If one of
+        * joining relations turns out to be unpartitioned, this pair of joining
+        * relations can not use partition-wise join technique.
+        */
+       if (!rel1->part_scheme || !rel2->part_scheme)
+               return;

How can this happen?  If rel->part_scheme != NULL, doesn't that imply
that every rel covered by the joinrel is partitioned that way, and
therefore this condition must necessarily hold?

In general, I think it's better style to write explicit tests against
NULL or NIL than to just write if (blahptr).

+       partitioned_join->sjinfo = copyObject(parent_sjinfo);

Why do we need to copy it?

+       /*
+        * Remove the relabel decoration. We can assume that there is
at most one
+        * RelabelType node; eval_const_expressions() simplifies multiple
+        * RelabelType nodes into one.
+        */
+       if (IsA(expr, RelabelType))
+               expr = (Expr *) ((RelabelType *) expr)->arg;

Still, instead of assuming this, you could just s/if/while/, and then
you wouldn't need the assumption any more.  Also, consider castNode().

partition_wise_plan_weight may be useful for testing, but I don't
think it should be present in the final patch.

This is not a full review; I ran out of mental energy before I got to
the end.  (Sorry.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Next
From: Stephen Frost
Date:
Subject: Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling fromrw-conflict tracking in serializable transactions