Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
Date
Msg-id CAG-ACPVfhe9TRjPVx95BQrY5svbw07ZCbgy7u7uy_=skHiKTOQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Responses Re: [HACKERS] advanced partition matching algorithm forpartition-wise join  (Etsuro Fujita <etsuro.fujita@gmail.com>)
List pgsql-hackers


On Fri, 3 Apr 2020 at 20:45, Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
Hi,

On Thu, Apr 2, 2020 at 2:12 AM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
> On Thu, 26 Mar 2020 at 00:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I've started reviewing the patch a couple days ago. I haven't done any
>> extensive testing, but I do have a bunch of initial comments that I can
>> share now.
>>
>> 1) I wonder if this needs to update src/backend/optimizer/README, which
>> does have a section about partitionwise joins. It seems formulated in a
>> way that that probably covers even this more advanced algorithm, but
>> maybe it should mention handling of default partitions etc.?

> Done. Please check the wording. It might need some word smithy.

You heavily changed the existing documentation about PWJ, but I don't
think we really need to do so.  Also, IMO I think the description
about default partitions you added is needed in README.  I think it
would be better to put such a description in source files.  How about
something like the attached, instead?  I wrote part of this based on
the commit message in the original versions of the patch you posted.

I corrected some grammar, typos. Broke longer sentences into smaller ones so that its easy to read and understand. As is the concept is hard to understand with all its limitations. Thanks for the example. Retained it.
 
You seem to have removed few comments that explained the algorithm in detail from build_joinrel_partition_info(). It would have been good to have those there. But I am ok not having them either.

But it will be good to have following addition I suggested in my patches to make sure nparts is set to 0 for an unpartitioned relation as per the comment on nparts in RelOptInfo.
@@ -1653,6 +1663,8 @@ build_joinrel_partition_info(RelOptInfo *joinrel, RelOptInfo *outer_rel,
                                jointype, restrictlist))
    {  
        Assert(!IS_PARTITIONED_REL(joinrel));
+       /* Join is not partitioned. */
+       joinrel->nparts = 0;
        return;
    }   
 
>> There certainly needs to be some description of the algorithm somewhere,
>> either in a README or before a suitable function. It doesn't have to be
>> particularly detailed, a rough outline of the matching would be enough,
>> so that readers don't have to rebuild the knowledge from pieces
>> scattered around various comments.

> The algorithm for list and range partitioned tables is slightly different. So, I have added separate prologue to each list_merge_bounds() and range_merge_bounds(). Please check if that serves the purpose.

Too detailed to me.  In this:

+ * If there are multiple partitions from one side matching a given partition on
+ * the other side, the algorithm bails out since we do not have machinary for
+ * joining one partition with mulitple partitions. It might happen that any of
+ * the list items of a partition from the outer relation do not appear in the
+ * inner relation and there is no default partition in the inner relation. Such
+ * a partition from the outer side will have no matching partition on the inner
+ * side. The algorithm will bail out in such a case since we do not have a
+ * mechanism to perform a join with a non-existing relation.

I don't think the last comment is correct; that would apply to the old
versions of this function IIRC, but not to the latest version.  How
about something much simpler like the attached, instead?

I know that algorithm pretty well by now, so it suffices for me to say we use something similar to merge join, but may be for someone without that background a detailed explanation is useful. But this looks fine at the moment.
 
> Done. Actually this wasn't updated when partition pruning was introduced, which could cause a partitionwise join to be not used even when those conditions were met. Similarly when a query involved whole row reference. It's hard to explain all the conditions under which partitionwise join technique will be used. But I have tried to keep it easy to understand.

IMO I think your words "there is exactly one pair of matching
partitions." is a bit misleading, because that sounds like that PWJ
doesn't allow multiply-segmented join.  How about s/exact
matching/one-to-one matching/ in the existing documentation, instead?

Good catch. That was really misleading. Looks good to me.


>> 3) I think the for nparts comment is somewhat misleading:
>>
>>    int nparts;  /* number of partitions; 0 = not partitioned;
>>                  * -1 = not yet set */
>>
>> which says that nparts=0 means not partitioned. But then there are
>> conditions like this:
>>
>>      /* Nothing to do, if the join relation is not partitioned. */
>>      if (joinrel->part_scheme == NULL || joinrel->nparts == 0)
>>          return;
>>
>> which (ignoring the obsolete comment) seems to say we can have nparts==0
>> even for partitioned tables, no?

Yeah, I think I was a bit hasty.  I fixed this.

For a non-join relation, nparts = 0 and nparts = -1 both have the same meaning. Although we never set nparts = 0 for a non-join relation? Otherwise, the comment looks good now.

--
Best Wishes,
Ashutosh

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Make MemoryContextMemAllocated() more precise
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] advanced partition matching algorithm forpartition-wise join