Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
Date
Msg-id CAPmGK16ik+90SN443o7OwE+Rs+i9vh8v2FiJ+61ipE1E7pbigQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>)
Responses Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>)
List pgsql-hackers
Hi,

On Thu, Apr 2, 2020 at 2:12 AM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
> On Thu, 26 Mar 2020 at 00:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I've started reviewing the patch a couple days ago. I haven't done any
>> extensive testing, but I do have a bunch of initial comments that I can
>> share now.
>>
>> 1) I wonder if this needs to update src/backend/optimizer/README, which
>> does have a section about partitionwise joins. It seems formulated in a
>> way that that probably covers even this more advanced algorithm, but
>> maybe it should mention handling of default partitions etc.?

> Done. Please check the wording. It might need some word smithy.

You heavily changed the existing documentation about PWJ, but I don't
think we really need to do so.  Also, IMO I think the description
about default partitions you added is needed in README.  I think it
would be better to put such a description in source files.  How about
something like the attached, instead?  I wrote part of this based on
the commit message in the original versions of the patch you posted.

>> There certainly needs to be some description of the algorithm somewhere,
>> either in a README or before a suitable function. It doesn't have to be
>> particularly detailed, a rough outline of the matching would be enough,
>> so that readers don't have to rebuild the knowledge from pieces
>> scattered around various comments.

> The algorithm for list and range partitioned tables is slightly different. So, I have added separate prologue to each
list_merge_bounds()and range_merge_bounds(). Please check if that serves the purpose. 

Too detailed to me.  In this:

+ * If there are multiple partitions from one side matching a given partition on
+ * the other side, the algorithm bails out since we do not have machinary for
+ * joining one partition with mulitple partitions. It might happen that any of
+ * the list items of a partition from the outer relation do not appear in the
+ * inner relation and there is no default partition in the inner relation. Such
+ * a partition from the outer side will have no matching partition on the inner
+ * side. The algorithm will bail out in such a case since we do not have a
+ * mechanism to perform a join with a non-existing relation.

I don't think the last comment is correct; that would apply to the old
versions of this function IIRC, but not to the latest version.  How
about something much simpler like the attached, instead?

>> 2) Do we need another GUC enabling this more complex algorithm? In PG11
>> the partitionwise join is disabled by default, on the grounds that it's
>> expensive and not worth it by default. How much more expensive is this?
>> Maybe it makes sense to allow enabling only the "simple" approach?

> We have reduced the complexity of merging bounds quite a bit so this shouldn't be costly. Further more we handle the
usualcase of equal bounds quickly using the merge flag so most of the cases should be fine. It's only when two
partitionedtables with same partition scheme are joined but do not have merge-able bounds that this algorithm would not
yielduseful result - but that would be rare in the field. enable_partitionwise_join = false should take care of such
scenarioseasily. I am not in favour of adding another GUC which we set to false by default and then take another few
releasesto make it true by default. 

I agree with Ashutosh.

>> 3) This comment in try_partitionwise_join is now incorrect, because the
>> condition may be true even for partitioned tables with (nparts == 0).
>>
>>      /* Nothing to do, if the join relation is not partitioned. */
>>      if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
>>          return;

> If part_scheme = NULL, npart should be 0, fixed that in build_joinrel_partition_info(). If partscheme != NULL but
boundscan not be merged, nparts = 0. So this condition is correct. Encapsulated in a macro
IS_JOINREL_NOT_PARTITITIONED().and added comments for the macro. Given that the macro is used exactly at one place, it
maynot be necessary to define it but it looks *nice*. 

I don't think it would be a good idea to add such a macro for only one place.

>> Moreover, the condition used to be
>>
>>      if (!IS_PARTITIONED_REL(joinrel))
>>          return;
>>
>> which is way more readable. I think it's net negative to replace these
>> "nice" macros with clear meaning with complex conditions. If needed, we
>> can invent new macros. There are many other places where the patch
>> replaces macros with less readable conditions.

> The only other place where we have replaced a *nice* macro is in build_joinrel_partition_info(). But I think it's a
legitreplacement. I have added a comment there. 

IIUC, this is the existing issue, so I think it would be better to
leave this for another improvement patch.

>> 6) The try_partitionwise_join function is getting a bit too long and
>> harder to understand. The whole block in
>>
>>      if (joinrel->nparts == -1)
>>      {
>>          ...
>>      }
>>
>> seems rather well isolated, so I propose to move it to a separate
>> function responsible only for the merging. We can simply call it on the
>> joinrel, and make it return right away if (joinrel->nparts == -1).

That's a good idea, so +1.

>> 7) I'd suggest not to reference exact functions in comments unless
>> abolutely necessary, because it's harder to maintain and it does not
>> really explain purpose of the struct/code. E.g. consider this:
>>
>>      /* Per-partitioned-relation data for merge_list_bounds()/merge_range_bounds() */
>>      typedef struct PartitionMap
>>      { ... }
>>
>> Why does it matter where is the struct used? That's pretty trivial to
>> find using 'git grep' or something. Instead the comment should explain
>> the purpose of the struct.

> Adjusted names and comments a bit.

I modified the comments a bit further.  I don't think we need to
change the name of a variable, so I kept it as is.

On Thu, Apr 2, 2020 at 1:51 AM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
> On Thu, 26 Mar 2020 at 05:47, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> three more comments after eye-balling the code for a bit longer.
>>
>> 1) The patch probably needs to tweak config.sgml which says this about
>> the enable_partitionwise_join GUC:
>>
>>    .. Partitionwise join currently applies only when the join conditions
>>    include all the partition keys, which must be of the same data type
>>    and have exactly matching sets of child partitions. ..

> Done. Actually this wasn't updated when partition pruning was introduced, which could cause a partitionwise join to
benot used even when those conditions were met. Similarly when a query involved whole row reference. It's hard to
explainall the conditions under which partitionwise join technique will be used. But I have tried to keep it easy to
understand.

IMO I think your words "there is exactly one pair of matching
partitions." is a bit misleading, because that sounds like that PWJ
doesn't allow multiply-segmented join.  How about s/exact
matching/one-to-one matching/ in the existing documentation, instead?

>> 2) Do we really need the 'merged' flag in try_partitionwise_join? Can't
>> we simply use the joinrel->merged flag directly? ISTM the we always
>> start with joinrel->merged=false, and then only ever set it to true in
>> some cases. I've tried doing that, per the attached 0002 patch. The
>> regression tests seem to work fine.

I introduced the flag "merged", because I thought it would be verbose
to write "joiners->merged" in many places, but I think removing that
would make the code simpler, so +1 for that change.

> Thanks. I just added a small prologue to compute_partition_bounds().

I tweaked that a bit.

>> I noticed this because I've tried moving part of the function into a
>> separate function, and removing the variable makes that simpler.
>>
>> The patch also does a couple additional minor tweaks.

    /*
     * Currently, this function is called only from
try_partitionwise_join(),
     * so the join type should be INNER, LEFT, FULL, SEMI, or ANTI.
+    *
+    * XXX Maybe an assert would be more appropriate? Or maybe just
+    * bail out by returning NULL? Not sure.
     */
    if (jointype != JOIN_INNER && jointype != JOIN_LEFT &&
        jointype != JOIN_FULL && jointype != JOIN_SEMI &&
        jointype != JOIN_ANTI)

I kept this as originally proposed, but I agree that an assertion
would be more appropriate, because it would save cycles a bit in a
non-assertion-enabled build.  I modified this as such.  I modified the
test below this as well.

>> 3) I think the for nparts comment is somewhat misleading:
>>
>>    int nparts;  /* number of partitions; 0 = not partitioned;
>>                  * -1 = not yet set */
>>
>> which says that nparts=0 means not partitioned. But then there are
>> conditions like this:
>>
>>      /* Nothing to do, if the join relation is not partitioned. */
>>      if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
>>          return;
>>
>> which (ignoring the obsolete comment) seems to say we can have nparts==0
>> even for partitioned tables, no?

Yeah, I think I was a bit hasty.  I fixed this.

Attached is the original patch (0001) and one patch (0002) with
changes including those by Tomas and Ashutosh.  In the 0002 patch I
fixed my many typos as well.  :-(  Many of them were found by Justin
Pryzby.  One of them was found by Ashutosh.  Thank you!

>> Anyway, attached is the original patch (0001) and two patches with
>> proposed changes. 0002 removes the "merged" flag as explained in (2),
>> 0003 splits the try_partitionwise_join() function into two parts.

Thanks for the patches, Tomas!

> I have added 0005 with the changes I described in this as well as the previous mail. 0004 is just some white space
fixes.

Thanks for the patches, Ashutosh!  I fixed the white space issues.

Best regards,
Etsuro Fujita

Attachment

pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: adding partitioned tables to publications
Next
From: Mark Dilger
Date:
Subject: Re: Should we add xid_current() or a int8->xid cast?