Home > mailing lists

Re: Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: Partition-wise join for join between (declaratively)partitioned tables
Date	April 4, 2017 17:22:01
Msg-id	CAFjFpRcPdAcUghMNFZuwKpD3Uq-Kp8QNLe9u-912DJqHbyK04A@mail.gmail.com Whole thread
In response to	Partition-wise join for join between (declaratively) partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses	Re: Partition-wise join for join between (declaratively)partitioned tables Re: Partition-wise join for join between (declaratively)partitioned tables
List	pgsql-hackers

Tree view

On Tue, Apr 4, 2017 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Mar 30, 2017 at 1:14 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> Done.

Ashutosh and I spent several hours discussing this patch set today.
I'm starting to become concerned about the fact that 0004 makes the
partition bounds part of the PartitionScheme, because that means you
can't do a partition-wise join between two tables that have any
difference at all in the partition bounds. It might be possible in
the future to introduce a notion of a compatible partition scheme, so
that you could say, OK, well, these two partition schemes are not
quite the same, but they are compatible, and we'll make a new
partition scheme for whatever results from reconciling them.

What I think *may* be better is to consider the partition bound
information as a property of the RelOptInfo rather than the
PartitionScheme. For example, suppose we're joining A with partitions
A1, A2, and A4 against B with partitions B1, B2, and B3 and C with
partitions C1, C2, and C5. With the current approach, we end up with
a PartitionScheme for each baserel and, not in this patch set but
maybe eventually, a separate PartitionScheme for each of (A B), (A C),
(B C), and (A B C). That seems pretty unsatisfying. If we consider
the PartitionScheme to only include the question of whether we're
doing a join on the partition keys, then if the join includes WHERE
a.key = b.key AND b.key = c.key, we can say that they all have the
same PartitionScheme up front. Then, each RelOptInfo can have a
separate list of bounds, like this:

A: 1, 2, 4
B: 1, 2, 3
C: 1, 2, 5
A B: 1, 2, 3, 4
A C: 1, 2, 4, 5
B C: 1, 2, 3, 5
A B C: 1, 2, 3, 4, 5

Or if it's an inner join, then instead of taking the union at each
level, we can take the intersection, because any partition without a
match on the other side of the join, then that partition can't produce
any rows and doesn't need to be scanned. In that case, the
RelOptInfos for (A B), (A C), (B, C), and (A, B, C) will all end up
with a bound list of 1, 2.

I have separated partition bounds from partition scheme. The patch adds build_joinrel_partition_bounds() to calculate the bounds of the join relation and the pairs of matching partitions from the joining relation. For now the function just check whether both the relations have same bounds and returns the bounds of the first one. But in future, we will expand this function to merge partition bounds from the joining relation and return pairs of matching partitions which when joined form the partitions of the join according to the merged partition bounds.

Also, moved the code to collect partition RelOptInfos from set_append_rel_size() to build_simple_rel(), so everything related to partitioning gets set in that function for a base relation.

I think, we should rename partition scheme as PartitionKeyOptInfo and club partition bounds, nparts and part_rels as PartitionDescOptInfo. But I haven't done that in this patch yet.

A related question (that I did not discuss with Ashutosh, but occurred
to me later) is whether the PartitionScheme ought to worry about
cross-type joins. For instance, if A is partitioned on an int4 column
and B is partitioned on an int8 column, and they are joined on their
respective partitioning columns, can't we still do a partition-wise
join? We do need to require that the operator family of the operator
actually used in the query, the operator family used to partition the
inner table, and the operator family used to partition the other table
all match; and the collation used for the comparison in the query, the
collation used to partition the outer table, and the collation used to
partition the inner table must all match. But it doesn't seem
necessary to require an exact type or typmod match. In many ways this
seems a whole lot like the what we test when building equivalence
classes (cf. process_equivalence) although I'm not sure that we can
leverage that in any useful way.

Yes, I agree. For an inner join, the partition key types need to "shrink" and for outer join they need to be "widened". I don't know if there is a way to know "wider" or "shorter" of two given types. We might have to implement a method to merge partition keys to produce partition key of the join, which may be different from either of the partition keys. So, after-all we may have to abandon the idea of canonical partition scheme. I haven't included this change in the attached set of patches.

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pg_dp_join_patches_v17.zip

pgsql-hackers by date:

From: Joe Conway
Date: 04 April 2017, 17:19:43
Subject: Re: partitioned tables and contrib/sepgsql

From: David Steele
Date: 04 April 2017, 17:22:44
Subject: Re: tuplesort_gettuple_common() and *should_free argument

Re: Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

Attachment

Previous

Next