Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: [HACKERS] advanced partition matching algorithm forpartition-wise join |
Date | |
Msg-id | CAPmGK14WHKckT1P6UJV2B63TZAxPyMn8iZJ99XF=AZuNhG6vow@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] advanced partition matching algorithm forpartition-wise join (amul sul <sulamul@gmail.com>) |
Responses |
Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
|
List | pgsql-hackers |
Hi Amul, On Mon, Sep 2, 2019 at 2:08 PM amul sul <sulamul@gmail.com> wrote: > Please find my comments inline below: Thank you for the review! > On Wed, Aug 28, 2019 at 3:52 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: >> On Fri, Aug 16, 2019 at 10:25 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On thinking further, a thought hits to me why we can't join two hash partitioned > table which has the same modulus and partition key specification, but some of > the partitions are missing from either partitioned table. > > For e.g. here is a smaller case where foo has two partitions and bar has only one. > > CREATE TABLE foo(a int) PARTITION BY HASH(a); > CREATE TABLE foo_p0 PARTITION OF foo FOR VALUES WITH(modulus 2, remainder 0); > CREATE TABLE foo_p1 PARTITION OF foo FOR VALUES WITH(modulus 2, remainder 1); > > CREATE TABLE bar(a int) PARTITION BY HASH(a); <-- missing partitions for REMAINDER 1 > CREATE TABLE bar_p0 PARTITION OF bar FOR VALUES WITH(modulus 2, remainder 0); > > Explain: > postgres=# explain select * from foo p1, bar p2 where p1.a = p2.a; > QUERY PLAN > --------------------------------------------------------------------------------- > Merge Join (cost=590.35..1578.47 rows=65025 width=8) > Merge Cond: (p2.a = p1.a) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: p2.a > -> Seq Scan on bar_p0 p2 (cost=0.00..35.50 rows=2550 width=4) > -> Sort (cost=410.57..423.32 rows=5100 width=4) > Sort Key: p1.a > -> Append (cost=0.00..96.50 rows=5100 width=4) > -> Seq Scan on foo_p0 p1 (cost=0.00..35.50 rows=2550 width=4) > -> Seq Scan on foo_p1 p1_1 (cost=0.00..35.50 rows=2550 width=4) > (10 rows) > > The partitions-wise join will be performed only if we fill the partition hole that > exists for the joining table i.e. adding partitions to bar table. > > postgres=# CREATE TABLE bar_p1 PARTITION OF bar FOR VALUES WITH(modulus 2, remainder 1); > CREATE TABLE > postgres=# explain select * from foo p1, bar p2 where p1.a = p2.a; > QUERY PLAN > --------------------------------------------------------------------------------- > Append (cost=359.57..2045.11 rows=65024 width=8) > -> Merge Join (cost=359.57..860.00 rows=32512 width=8) > Merge Cond: (p1.a = p2.a) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: p1.a > -> Seq Scan on foo_p0 p1 (cost=0.00..35.50 rows=2550 width=4) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: p2.a > -> Seq Scan on bar_p0 p2 (cost=0.00..35.50 rows=2550 width=4) > -> Merge Join (cost=359.57..860.00 rows=32512 width=8) > Merge Cond: (p1_1.a = p2_1.a) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: p1_1.a > -> Seq Scan on foo_p1 p1_1 (cost=0.00..35.50 rows=2550 width=4) > -> Sort (cost=179.78..186.16 rows=2550 width=4) > Sort Key: p2_1.a > -> Seq Scan on bar_p1 p2_1 (cost=0.00..35.50 rows=2550 width=4) > (17 rows) > > It would have been nice if we could support this case, as we do allow partitions > hole for the other partition scheme, but there wouldn't be much objection if > we don't want to add this support for now since there will be a lesser chance > that hash partitioned table has the hole, IMO. I agree with you on that point. >> If there are no objections, I'll merge the attached with the base patch in [1]. > The proposed enhancement in the patch is too good and the patch is pretty much > reasonable to merge into the main patch. Done. Attached is a merged version of the patch. > Here are the few cosmetic fixes for this path I think is needed. Feel free to > ignore if some of them do not make sense or too obvious. > > Note: left side number represents code line number of the patch. > > 118 + } > 119 + > 120 + /* > 121 + * Try to create the partition bounds along with join pairs. > 122 + */ > 123 + if (boundinfo == NULL) > 124 + { > > Can we add this block as else part of previous if condition checking equal partitions bound? Done. > 133 + Assert(list_length(parts1) == list_length(parts2)); > 134 + if (boundinfo == NULL) > 135 + { > 136 + joinrel->nparts = 0; > 137 + return; > 138 + } > 139 + nparts = list_length(parts1); > And the question is do we need to do that assert check since partition_bounds_merge() > does that just before returning, thoughts? You are right; that assertion would be redundant, so I removed that. > 204 + RelOptInfo *child_rel1 = merged ? (RelOptInfo *) lfirst(lcr1) : rel1->part_rels[cnt_parts]; > 205 + RelOptInfo *child_rel2 = merged ? (RelOptInfo *) lfirst(lcr2) : rel2->part_rels[cnt_parts]; > > How about using lfirst_node instead of lfirst & casting explicitly? > > Also, these lines crossing 80 column length which I think we need to fix. How about > doing the assignment as follow, just after the variable declaration part: > > if (merged) > { > child_rel1 = lfirst_node(lcr1, RelOptInfo); > child_rel2 = lfirst_node(lcr2, RelOptInfo); > lcr1 = lnext(parts1, lcr1); > lcr2 = lnext(parts2, lcr2); > } > else > { > child_rel1 = rel1->part_rels[cnt_parts]; > child_rel2 = rel2->part_rels[cnt_parts] > } > > rel1_empty = (child_rel1 == NULL || IS_DUMMY_REL(child_rel1)); > rel2_empty = (child_rel2 == NULL || IS_DUMMY_REL(child_rel2)); Done. (I don't think the order of arguments for first_node() above is correct; the first argument for that should be RelOptInfo.) > 266 + * get_matching_part_pairs > 267 + * Generate join pairs of partitions for the two inputs for the join rel > > Can we rewrite this description as " Generate join pairs of partitions for the > join rel from the two inputs." OR "Generate join pairs of partitions for the > two inputs" Done. > 310 + Assert(bms_num_members(child_relids1) == bms_num_members(rel1->relids)); > 311 + /* > > 335 + Assert(bms_num_members(child_relids2) == bms_num_members(rel2->relids)); > 336 + /* > > Need newline after assert statements. Done. Other change: I guess this in partbounds.c would be leftovers from an older version of the patch, so I removed it. +/* + * Allocate and initialize partition maps. We maintain four maps, two maps + * for each joining relation. pmap[i] gives the partition from the other + * relation which would join with ith partition of the given relation. + * Partition i from the given relation will join with partition pmap[i] + * from the other relation to produce partition mmap[i] of the join (merged + * partition). + * + * pmap[i] = -1 indicates that ith partition of a given relation does not + * have a matching partition from the other relation. + * + * mmap[i] = -1 indicates that ith partition of a given relation does not + * contribute to the join result. That can happen only when the given + * relation is the inner relation and it doesn't have a matching partition + * from the outer relation, hence pmap[i] should be -1. + * + * In case of an outer join, every partition of the outer join will appear + * in the join result, and thus has mmap[i] set for all i. But it's not + * necessary that every partition on the outer side will have a matching + * partition on the inner side. In such a case, we end up with pmap[i] = -1 + * and mmap[i] != -1. + */ I will continue to review the rest of the patch. I am sorry for the long delay. Best regards, Etsuro Fujita
Attachment
pgsql-hackers by date: