Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables - Mailing list pgsql-hackers
From | Antonin Houska |
---|---|
Subject | Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables |
Date | |
Msg-id | 11888.1504782175@localhost Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables
|
List | pgsql-hackers |
Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Fri, Sep 1, 2017 at 6:05 PM, Antonin Houska <ah@cybertec.at> wrote: > > > > > > > > * get_partitioned_child_rels_for_join() > > > > I think the Assert() statement is easier to understand inside the loop, see > > the assert.diff attachment. > The assert at the end of function also checks that we have got > child_rels lists for all the parents passed in. Really? I can imagine that some instances of PartitionedChildRelInfo have the child_rels list empty, while other ones have these lists long enough to compensate for the empty lists. > > > > > > * have_partkey_equi_join() > > > > As the function handles generic join, this comment doesn't seem to me > > relevant: > > > > /* > > * The equi-join between partition keys is strict if equi-join between > > * at least one partition key is using a strict operator. See > > * explanation about outer join reordering identity 3 in > > * optimizer/README > > */ > > strict_op = op_strict(opexpr->opno); > > What in that comment is not exactly relevant? Basically I don't understand why you mention join reordering here. The join ordering questions must all have been resolved by the time have_partkey_equi_join() is called. > > > > And I think the function can return true even if strict_op is false for all > > the operators evaluated in the loop. > > I think it does that. Do you have a case where it doesn't? Here I refer to this part of the comment above: "... if equi-join between at least one partition key is using a strict operator." My understanding of the code (especially match_expr_to_partition_keys) is that no operator actually needs to be strict as long as each operator involved in the join matches at least one non-nullable expression on both sides of the join. > > * match_expr_to_partition_keys() > > > > I'm not sure this comment is clear enough: > > > > /* > > * If it's a strict equi-join a NULL partition key on one side will > > * not join a NULL partition key on the other side. So, rows with NULL > > * partition key from a partition on one side can not join with those > > * from a non-matching partition on the other side. So, search the > > * nullable partition keys as well. > > */ > > if (!strict_op) > > continue; > > > > My understanding of the problem of NULL values generated by outer join is: > > these NULL values --- if evaluated by non-strict expression --- can make row > > of N-th partition on one side of the join match row(s) of *other than* N-th > > partition(s) on the other side. Thus the nullable input expressions may only > > be evaluated by strict operators. I think it'd be clearer if you stressed that > > (undesired) *match* of partition keys can be a problem, rather than mismatch > > Sorry, I am not able to understand this. To me it looks like my > wording conveys what you are saying. I just tried to expreess the idea in a way that is clearer to me. I think we both mean the same. Not sure I should spend more effort on another version of the comment. > > If you insist on your wording, then I think you should at least move the > > comment below to the part that only deals with strict operators. > > Done. o.k. > > > > * map_and_merge_partitions() > > > > Besides a few changes proposed in map_and_merge_partitions.diff (a few of them > > to suppress compiler warnings) I think that this part needs more thought: > > > > { > > Assert(mergemap1[index1] != mergemap2[index2] && > > mergemap1[index1] >= 0 && mergemap2[index2] >= 0); > > > > /* > > * Both the partitions map to different merged partitions. This > > * means that multiple partitions from one relation matches to one > > * partition from the other relation. Partition-wise join does not > > * handle this case right now, since it requires ganging multiple > > * partitions together (into one RelOptInfo). > > */ > > merged_index = -1; > > } > > > > I could hit this path with the following test: > > > > CREATE TABLE a(i int) PARTITION BY LIST(i); > > CREATE TABLE a_0 PARTITION OF a FOR VALUES IN (0, 2); > > CREATE TABLE b(j int) PARTITION BY LIST(j); > > CREATE TABLE b_0 PARTITION OF b FOR VALUES IN (1, 2); > > > > SET enable_partition_wise_join TO on; > > > > SELECT * > > FROM a > > FULL JOIN > > b ON i = j; > > > > I don't think there's a reason not to join a_0 partition to b_0, is there? > > With the latest patchset I am seeing that partition-wise join is used > in this case. I have started a new thread [1] for advanced partition > matching patches. What plan do you get, with the patches from https://www.postgresql.org/message-id/CAFjFpRfdXpuSu0pxON3dKcr8WndJkaXLzHUVax_Laod0Tgc6UQ@mail.gmail.com I still see the join above Append, not below: QUERY PLAN -------------------------------------------------------------------------Merge Full Join (cost=359.57..860.00 rows=32512width=8) Merge Cond: (a_0.i = b_0.j) -> Sort (cost=179.78..186.16 rows=2550 width=4) Sort Key: a_0.i -> Append (cost=0.00..35.50 rows=2550 width=4) -> Seq Scan on a_0 (cost=0.00..35.50 rows=2550width=4) -> Sort (cost=179.78..186.16 rows=2550 width=4) Sort Key: b_0.j -> Append (cost=0.00..35.50rows=2550 width=4) -> Seq Scan on b_0 (cost=0.00..35.50 rows=2550 width=4) > Please post review comments about the last two patches on that thread. ok, I'll do if I find any problem. > [1] https://www.postgresql.org/message-id/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
pgsql-hackers by date: