Re: Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Partition-wise join for join between (declaratively)partitioned tables
Date
Msg-id CAFjFpRfBipiEzP739PFDrzxrWG-UKThHj3tyHkwCeE2hDwP78A@mail.gmail.com
Whole thread Raw
In response to Re: Partition-wise join for join between (declaratively)partitioned tables  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
Somehow I sent the old patch set again. Here's the real v17.

On Tue, Apr 4, 2017 at 7:52 PM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>
>
> On Tue, Apr 4, 2017 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Mar 30, 2017 at 1:14 AM, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> > Done.
>>
>> Ashutosh and I spent several hours discussing this patch set today.
>> I'm starting to become concerned about the fact that 0004 makes the
>> partition bounds part of the PartitionScheme, because that means you
>> can't do a partition-wise join between two tables that have any
>> difference at all in the partition bounds.  It might be possible in
>> the future to introduce a notion of a compatible partition scheme, so
>> that you could say, OK, well, these two partition schemes are not
>> quite the same, but they are compatible, and we'll make a new
>> partition scheme for whatever results from reconciling them.
>>
>> What I think *may* be better is to consider the partition bound
>> information as a property of the RelOptInfo rather than the
>> PartitionScheme.  For example, suppose we're joining A with partitions
>> A1, A2, and A4 against B with partitions B1, B2, and B3 and C with
>> partitions C1, C2, and C5.  With the current approach, we end up with
>> a PartitionScheme for each baserel and, not in this patch set but
>> maybe eventually, a separate PartitionScheme for each of (A B), (A C),
>> (B C), and (A B C).  That seems pretty unsatisfying.  If we consider
>> the PartitionScheme to only include the question of whether we're
>> doing a join on the partition keys, then if the join includes WHERE
>> a.key = b.key AND b.key = c.key, we can say that they all have the
>> same PartitionScheme up front.  Then, each RelOptInfo can have a
>> separate list of bounds, like this:
>>
>> A: 1, 2, 4
>> B: 1, 2, 3
>> C: 1, 2, 5
>> A B: 1, 2, 3, 4
>> A C: 1, 2, 4, 5
>> B C: 1, 2, 3, 5
>> A B C: 1, 2, 3, 4, 5
>>
>> Or if it's an inner join, then instead of taking the union at each
>> level, we can take the intersection, because any partition without a
>> match on the other side of the join, then that partition can't produce
>> any rows and doesn't need to be scanned.  In that case, the
>> RelOptInfos for (A B), (A C), (B, C), and (A, B, C) will all end up
>> with a bound list of 1, 2.
>
>
> I have separated partition bounds from partition scheme. The patch adds
> build_joinrel_partition_bounds() to calculate the bounds of the join
> relation and the pairs of matching partitions from the joining relation. For
> now the function just check whether both the relations have same bounds and
> returns the bounds of the first one. But in future, we will expand this
> function to merge partition bounds from the joining relation and return
> pairs of matching partitions which when joined form the partitions of the
> join according to the merged partition bounds.
>
> Also, moved the code to collect partition RelOptInfos from
> set_append_rel_size() to build_simple_rel(), so everything related to
> partitioning gets set in that function for a base relation.
>
> I think, we should rename partition scheme as PartitionKeyOptInfo and club
> partition bounds, nparts and part_rels as PartitionDescOptInfo. But I
> haven't done that in this patch yet.
>
>>
>>
>> A related question (that I did not discuss with Ashutosh, but occurred
>> to me later) is whether the PartitionScheme ought to worry about
>> cross-type joins.  For instance, if A is partitioned on an int4 column
>> and B is partitioned on an int8 column, and they are joined on their
>> respective partitioning columns, can't we still do a partition-wise
>> join?  We do need to require that the operator family of the operator
>> actually used in the query, the operator family used to partition the
>> inner table, and the operator family used to partition the other table
>> all match; and the collation used for the comparison in the query, the
>> collation used to partition the outer table, and the collation used to
>> partition the inner table must all match.  But it doesn't seem
>> necessary to require an exact type or typmod match.  In many ways this
>> seems a whole lot like the what we test when building equivalence
>> classes (cf. process_equivalence) although I'm not sure that we can
>> leverage that in any useful way.
>>
>
> Yes, I agree. For an inner join, the partition key types need to "shrink"
> and for outer join they need to be "widened". I don't know if there is a way
> to know "wider" or "shorter" of two given types. We might have to implement
> a method to merge partition keys to produce partition key of the join, which
> may be different from either of the partition keys. So, after-all we may
> have to abandon the idea of canonical partition scheme. I haven't included
> this change in the attached set of patches.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: increasing the default WAL segment size
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: multivariate statistics (v25)