Re: Partition-wise join for join between (declaratively)partitioned tables - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Partition-wise join for join between (declaratively)partitioned tables |
Date | |
Msg-id | CAFjFpRfBipiEzP739PFDrzxrWG-UKThHj3tyHkwCeE2hDwP78A@mail.gmail.com Whole thread Raw |
In response to | Re: Partition-wise join for join between (declaratively)partitioned tables (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
List | pgsql-hackers |
Somehow I sent the old patch set again. Here's the real v17. On Tue, Apr 4, 2017 at 7:52 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > > > On Tue, Apr 4, 2017 at 2:37 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Thu, Mar 30, 2017 at 1:14 AM, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >> > Done. >> >> Ashutosh and I spent several hours discussing this patch set today. >> I'm starting to become concerned about the fact that 0004 makes the >> partition bounds part of the PartitionScheme, because that means you >> can't do a partition-wise join between two tables that have any >> difference at all in the partition bounds. It might be possible in >> the future to introduce a notion of a compatible partition scheme, so >> that you could say, OK, well, these two partition schemes are not >> quite the same, but they are compatible, and we'll make a new >> partition scheme for whatever results from reconciling them. >> >> What I think *may* be better is to consider the partition bound >> information as a property of the RelOptInfo rather than the >> PartitionScheme. For example, suppose we're joining A with partitions >> A1, A2, and A4 against B with partitions B1, B2, and B3 and C with >> partitions C1, C2, and C5. With the current approach, we end up with >> a PartitionScheme for each baserel and, not in this patch set but >> maybe eventually, a separate PartitionScheme for each of (A B), (A C), >> (B C), and (A B C). That seems pretty unsatisfying. If we consider >> the PartitionScheme to only include the question of whether we're >> doing a join on the partition keys, then if the join includes WHERE >> a.key = b.key AND b.key = c.key, we can say that they all have the >> same PartitionScheme up front. Then, each RelOptInfo can have a >> separate list of bounds, like this: >> >> A: 1, 2, 4 >> B: 1, 2, 3 >> C: 1, 2, 5 >> A B: 1, 2, 3, 4 >> A C: 1, 2, 4, 5 >> B C: 1, 2, 3, 5 >> A B C: 1, 2, 3, 4, 5 >> >> Or if it's an inner join, then instead of taking the union at each >> level, we can take the intersection, because any partition without a >> match on the other side of the join, then that partition can't produce >> any rows and doesn't need to be scanned. In that case, the >> RelOptInfos for (A B), (A C), (B, C), and (A, B, C) will all end up >> with a bound list of 1, 2. > > > I have separated partition bounds from partition scheme. The patch adds > build_joinrel_partition_bounds() to calculate the bounds of the join > relation and the pairs of matching partitions from the joining relation. For > now the function just check whether both the relations have same bounds and > returns the bounds of the first one. But in future, we will expand this > function to merge partition bounds from the joining relation and return > pairs of matching partitions which when joined form the partitions of the > join according to the merged partition bounds. > > Also, moved the code to collect partition RelOptInfos from > set_append_rel_size() to build_simple_rel(), so everything related to > partitioning gets set in that function for a base relation. > > I think, we should rename partition scheme as PartitionKeyOptInfo and club > partition bounds, nparts and part_rels as PartitionDescOptInfo. But I > haven't done that in this patch yet. > >> >> >> A related question (that I did not discuss with Ashutosh, but occurred >> to me later) is whether the PartitionScheme ought to worry about >> cross-type joins. For instance, if A is partitioned on an int4 column >> and B is partitioned on an int8 column, and they are joined on their >> respective partitioning columns, can't we still do a partition-wise >> join? We do need to require that the operator family of the operator >> actually used in the query, the operator family used to partition the >> inner table, and the operator family used to partition the other table >> all match; and the collation used for the comparison in the query, the >> collation used to partition the outer table, and the collation used to >> partition the inner table must all match. But it doesn't seem >> necessary to require an exact type or typmod match. In many ways this >> seems a whole lot like the what we test when building equivalence >> classes (cf. process_equivalence) although I'm not sure that we can >> leverage that in any useful way. >> > > Yes, I agree. For an inner join, the partition key types need to "shrink" > and for outer join they need to be "widened". I don't know if there is a way > to know "wider" or "shorter" of two given types. We might have to implement > a method to merge partition keys to produce partition key of the join, which > may be different from either of the partition keys. So, after-all we may > have to abandon the idea of canonical partition scheme. I haven't included > this change in the attached set of patches. > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Attachment
pgsql-hackers by date: