Home > mailing lists

[HACKERS] advanced partition matching algorithm for partition-wise join - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	[HACKERS] advanced partition matching algorithm for partition-wise join
Date	August 21, 2017 10:13:16
Msg-id	CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com Whole thread
Responses	Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
List	pgsql-hackers

Tree view

The patch-set in [1] supports partition-wise join when the partition bounds and
partition keys of the joining tables exactly match. The last two patches in the
last few patch-sets in that thread implement more
advanced partition matching code. In order to avoid mixing reviews for advanced
partition matching and the basic partition-wise join implementation, I am
starting a new thread to discuss the same. I am attaching the last two
patches from that patch set here.

The new partition matching algorithm handles following cases when a
given partition
on one side has at most one matching partition matching on the other side.

1. When the ranges of the joining tables do not match exactly E.g. partition
table t1 has partitions t1p1 (0 - 100), t1p2 (150 - 200) and partition table t2
has partitions t2p1 (0 - 50), t2p2 (100 - 175). In this case (t1p1, t2p1) and
(t1p2, t2p2) form the matching partition pairs, which can be joined. While
matching the pairs, we also compute the partition bounds for the resulting
join. An INNER join between t1 and t2 will have ranges (0 - 50) since no row
with 50 <= key < 100 from t1p1 is going to find a matching row in t2p1 and (150
- 175) since no row with 100 <= key < 150 from t2p2 is going to find a matching
row in t1p2 and no row with 175 <= key < 200 in t1p2 is going to find a
matching row in t1p2. A t1 LEFT join t2 on the other hand will have ranges same
as the outer relation i.e. t1, (0 - 100), (150 - 200) since all rows from t1
will be part of the join. Thus depending upon the type of join the partition
bounds of the resultant join relation change. Similarly for list partitioned
table, when the lists do not match exactly, the algorithm finds matching pairs
of partitions and the lists of resultant join relation. E.g. t1 has
partitions t1p1 ('a',
'b', 'c'), t1p2 ('e', 'f') and t2 has partitions t2p1 ('a', 'b'), t2p2 ('d',
'e', 'f'). In this case (t1p1, t2p1) and (t2p1, t2p2) form the matching
pairs which are joined. Inner join will have bounds ('a','b'), ('e', 'f') and
t1 LEFT JOIN t2 will have bounds same as t1.

2. When one or both side have at least one partition that does not have
matching partition on the other side. E.g. t1 has partitions t1p1 ('a','b'),
t1p2 ('c','d') and t2 has only one partition t2p1 ('a','b') OR t1 has
partitions t1p1 (0 - 100), t1p2 (100 - 200) and t2 has only one partition t2p1
(0 - 100). In this case as well different types of joins will have different
partition bounds for the result using similar rules described above.

3. A combination of 1 and 2 e.g. t1 has partitions t1p1('a','b','c'),
t1p2('d','e','f') and t2 has a single partition t2p1 ('a','b', 'z').

Algorithm
---------
The pairs of matching partitions and the partition bounds of the join are
calculated by an algorithm similar to merge join.

In such a join, it can be observed that every partition on either side,
contributes to at most one partition of the resultant join relation. Thus for
every partition on either side, we keep track of the partition of resultant
join (if any), which it contributes to. If multiple partitions from any of the
joining relations map to a single partition of the resultant join, we need to
gang those partitions together before joining the partition/s from the other
side. Since we do not have infrastructure for ganging multiple arbitrary
RelOptInfos together in a parent RelOptInfo, we do not support such a
partitionw-wise join right now. We stop merging the bounds immediately when we
detect such a case.

For list partitioned tables, we compare list values from both the sides,
starting with the lowest. If the two list values being compared match,
corresponding partitions from both sides form a pair of partitions to be
joined. We record this mapping and also include the list value in join bounds.
If the two list values do not match and the lower of those two comes from the
outer side of the join, we include it in the join bounds. We advance to the
next list value on side with the lower list value continuing the process of
merging till list values on at least one side are exhausted. If the remaining
values are from the outer side, we include those in the join partition bounds.
Every list value included in the join bounds, and its originating partition/s
are associated with appropriate partition of the resultant join. For more
details please see partition_list_bounds_merge() in the attached patch.

In case of range partitioned tables, we compare the ranges of the partitions in
increasing order of their bounds. If two ranges being compared overlap,
corresponding partitions from both sides form a pair of partitions to be
joined. We record this mapping and also include the merged range in the bounds
of resultant join. The overlapping ranges are merged based on the type of join
as described above. If either of the ranges completely precedes the other, and
it's on the outer side, we include that range in the bounds of resultant join.
We advance to the next range on the side with lower upper bound till ranges on
at least one side are exhausted. If the remaining ranges are from the outer
side, we include those in the partition bounds of resultant join. While
including a range in the partition bounds of the resultant join if its lower
bound precedes the upper bound of the last included range, it indicates that
multiple partitions on that side map to one partition on the other side, so we
bail out. Notice that in this method, we always include the ranges in the
partition bounds of the resultant join in the increasing order of their bounds.
Every range included in the join's partition bounds and it's corresponding
partition/s from joining relations are associated with appropriate partition of
the resultant join. For more details please see partition_range_bounds_merge()
in the attached patch.

The partitions from both sides (one partition from each side) which map to the
same partition of the resultant join are joined to form child-joins. The case
when an outer partition may not have a matching partition from the inner side
will be discussed in the next section. Except for the above algorithm to find
the pairs of matching partitions and calculating bounds of the resultant join,
the rest of the partition-wise join algorithm remains the same.

Unsupported case: When a partition from outer side doesn't have matching
partition on the inner side.
--------------------------------------------------------------------------
Consider a join t1 LEFT JOIN t2 where t1 has partitions t1p1 (0 - 100), t1p2
(100 - 200) and t2 has a single partition t2p1(0 - 100). The rows in t1p2 won't
have a matching row in t2 since there is no partition matching t1p2. The result
of the join will have rows in t1p2 with columns from t2 NULLed. In order to
execute this join as a partition-wise join, we need a dummy relation in place
of the missing partition, which we can join with t1p2. We need this placeholder
dummy relation (its targetlist, relids etc.), so that rest of the planner can
work with the resulting child-join.

We notice the missing partitions only while planning the join (during the
execution of make_one_rel()), by which time we have frozen the number of base
relations. Introducing a base relation during join planning is not supported
in current planner. Similarly, a partition can be missing from a partitioned
join relation, in which case we have to add a dummy join relation. This might
need adding corresponding base relations as well. I have not spent time looking
for what it takes to support these cases. For now the patch does not support
partition-wise join in such cases.

TODOs
-----------
1. Add tests for advanced partition matching algorithm
2. Improve code quality, commenting, function names etc.

[1] https://www.postgresql.org/message-id/CAFjFpRd9Vqh_=-Ldv-XqWY006d07TJ+VXuhXCbdj=P1jukYBrw@mail.gmail.com

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 21 August 2017, 10:03:49
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables

From: Haribabu Kommi
Date: 21 August 2017, 10:28:59
Subject: Re: [HACKERS] Pluggable storage

[HACKERS] advanced partition matching algorithm for partition-wise join - Mailing list pgsql-hackers

Attachment

Previous

Next