Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAFjFpRe-8P5pzfET4YZRH0Vawd0_o2TK4_zo+AjZ04mfCB3O0A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables  (Antonin Houska <ah@cybertec.at>)
Responses Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
List pgsql-hackers
On Fri, Sep 1, 2017 at 6:05 PM, Antonin Houska <ah@cybertec.at> wrote:
> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote:
>
>> I originally thought to provide it along-with the changes to
>> expand_inherited_rtentry(), but that thread is taking longer. Jeevan
>> Chalke needs rebased patches for his work on aggregate pushdown and
>> Thomas might need them for further review. So, here they are.
>
> Since I have related patch in the current commitfest
> (https://commitfest.postgresql.org/14/1247/), I spent some time reviewing your
> patch:
>
> * generate_partition_wise_join_paths()
>
> Right parenthesis is missing in the prologue.

Thanks for pointing that out. Fixed.

>
>
> * get_partitioned_child_rels_for_join()
>
> I think the Assert() statement is easier to understand inside the loop, see
> the assert.diff attachment.

The assert at the end of function also checks that we have got
child_rels lists for all the parents passed in. That is not checked by
your version. Furthermore, we would checked that each child_rels has
at least one element while buildings paths for base relations.
Checking the same again for joins doesn't add any value.

>
>
> * have_partkey_equi_join()
>
> As the function handles generic join, this comment doesn't seem to me
> relevant:
>
>     /*
>      * The equi-join between partition keys is strict if equi-join between
>      * at least one partition key is using a strict operator. See
>      * explanation about outer join reordering identity 3 in
>      * optimizer/README
>      */
>     strict_op = op_strict(opexpr->opno);

What in that comment is not exactly relevant?

>
> And I think the function can return true even if strict_op is false for all
> the operators evaluated in the loop.

I think it does that. Do you have a case where it doesn't?

>
>
> * match_expr_to_partition_keys()
>
> I'm not sure this comment is clear enough:
>
>     /*
>      * If it's a strict equi-join a NULL partition key on one side will
>      * not join a NULL partition key on the other side. So, rows with NULL
>      * partition key from a partition on one side can not join with those
>      * from a non-matching partition on the other side. So, search the
>      * nullable partition keys as well.
>      */
>     if (!strict_op)
>             continue;
>
> My understanding of the problem of NULL values generated by outer join is:
> these NULL values --- if evaluated by non-strict expression --- can make row
> of N-th partition on one side of the join match row(s) of *other than* N-th
> partition(s) on the other side. Thus the nullable input expressions may only
> be evaluated by strict operators. I think it'd be clearer if you stressed that
> (undesired) *match* of partition keys can be a problem, rather than mismatch

Sorry, I am not able to understand this. To me it looks like my
wording conveys what you are saying.

>
> If you insist on your wording, then I think you should at least move the
> comment below to the part that only deals with strict operators.

Done.

>
>
> * There are several places where lfirst_node() macro should be used. For
>   example
>
> rel = lfirst_node(RelOptInfo, lc);
>
> instead of
>
> rel = (RelOptInfo *) lfirst(lc);

Thanks for that.

>
>
> * map_and_merge_partitions()
>
> Besides a few changes proposed in map_and_merge_partitions.diff (a few of them
> to suppress compiler warnings) I think that this part needs more thought:
>
>     {
>             Assert(mergemap1[index1] != mergemap2[index2] &&
>                        mergemap1[index1] >= 0 && mergemap2[index2] >= 0);
>
>             /*
>              * Both the partitions map to different merged partitions. This
>              * means that multiple partitions from one relation matches to one
>              * partition from the other relation. Partition-wise join does not
>              * handle this case right now, since it requires ganging multiple
>              * partitions together (into one RelOptInfo).
>              */
>             merged_index = -1;
>     }
>
> I could hit this path with the following test:
>
> CREATE TABLE a(i int) PARTITION BY LIST(i);
> CREATE TABLE a_0 PARTITION OF a FOR VALUES IN (0, 2);
> CREATE TABLE b(j int) PARTITION BY LIST(j);
> CREATE TABLE b_0 PARTITION OF b FOR VALUES IN (1, 2);
>
> SET enable_partition_wise_join TO on;
>
> SELECT  *
> FROM    a
>         FULL JOIN
>         b ON i = j;
>
> I don't think there's a reason not to join a_0 partition to b_0, is there?

With the latest patchset I am seeing that partition-wise join is used
in this case. I have started a new thread [1] for advanced partition
matching patches. Please post review comments about the last two
patches on that thread.

[1] https://www.postgresql.org/message-id/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com

Attached patchset with above comments addressed.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Sokolov Yura
Date:
Subject: Re: [HACKERS] Walsender timeouts and large transactions
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Release Note changes