Re: [HACKERS] advanced partition matching algorithm forpartition-wise join - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: [HACKERS] advanced partition matching algorithm forpartition-wise join |
Date | |
Msg-id | CAPmGK17nCL1bk2tLgk6umj19Foj91qS0Z3=YfBy2qUCjU6FWTA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] advanced partition matching algorithm forpartition-wise join (amul sul <sulamul@gmail.com>) |
Responses |
Re: [HACKERS] advanced partition matching algorithm forpartition-wise join
|
List | pgsql-hackers |
Hi Amul, On Wed, Nov 6, 2019 at 3:12 PM amul sul <sulamul@gmail.com> wrote: > Attached is the rebase atop the latest master head(a9056cc637f). Thanks for that! > On Tue, Nov 5, 2019 at 6:44 PM amul sul <sulamul@gmail.com> wrote: >> On Fri, Nov 1, 2019 at 3:58 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: >>> Done. Attached is a new version of the patch. >> A query and comments for v25: >> >> 583 + * The function returns NULL if we can not find the matching pair of >> 584 + * partitions. This happens if 1. multiple partitions on one side match with >> 585 + * one partition on the other side. 2. a given partition on the outer side >> 586 + * doesn't have a matching partition on the inner side. We can not support the >> 587 + * first case since we don't have a way to represent multiple partitions as a >> 588 + * single relation (RelOptInfo) and then perform join using the ganged >> 589 + * relation. We can not support the second case since the missing inner >> 590 + * partition needs to be represented as an empty relation and we don't have a >> 591 + * way to introduce empty relation during join planning after creating paths >> 592 + * for all the base relations. >> 593 + */ >> 594 +PartitionBoundInfo >> 595 +partition_bounds_merge(int partnatts, >> >> I think the second condition mention for partition_bounds_merge() looks >> outdated, do you think we should remove that or am I missing something here? Yeah, I think so. In addition to that, the output arguments for that function *outer_pars and *inner_parts don't store partition indexes anymore, so I updated comments for that function totally. Does that make sense? >> 1768 + >> 1769 + /* >> 1770 + * If this is an outer join, the merged partition would act as the >> 1771 + * default partition of the join; record the index in *default_index >> 1772 + * if not done yet. >> 1773 + */ >> 1774 + if (jointype == JOIN_LEFT || jointype == JOIN_ANTI || >> 1775 + jointype == JOIN_FULL) >> 1776 + { >> >> As decided need to replace this list by IS_OUTER_JOIN(jointype). Done. >> 2020 + if (jointype == JOIN_LEFT || jointype == JOIN_FULL || >> 2021 + jointype == JOIN_ANTI) >> 2022 + { >> >> Same as the previous. Done. >> I tried a coverage testing and tried to adjust & add a few tests to improved the >> code coverage for the v25 patch. Please have a look at the attached 0002 & also >> attach the coverage output with & without this patch, TWIMW. Good idea! I merged that to the main patch after modifying that a bit so to reduce the outputs of SELECTs. Attached is an updated version of the main patch. Thanks for reviewing! I looked at partition_range_bounds_merge() more closely. Here are my comments: * When merging partition bounds from the outer/inner sides in the while loop of that function, we only moves to the next partition on one side when both partitions overlap (ie, overlap=true) and the upper bounds of partitions are not the same (ie, ub_cmpval<0 or ub_cmpval>0); but I think that's inefficient. Let me explain using an example: create table prt1 (a int, b int) partition by range (a); create table prt1_p1 partition of prt1 for values from (0) to (50); create table prt1_p2 partition of prt1 for values from (100) to (150); create table prt2 (a int, b int) partition by range (a); create table prt2_p1 partition of prt2 for values from (0) to (100); create table prt2_p2 partition of prt2 for values from (100) to (200); insert into prt1 select i, i from generate_series(0, 49) i; insert into prt1 select i, i from generate_series(100, 149) i; insert into prt2 select i, i from generate_series(0, 49) i; insert into prt2 select i, i from generate_series(100, 149) i; analyze prt1; analyze prt2; set enable_partitionwise_join to on; explain verbose select * from prt1 t1 full join prt2 t2 on (t1.a = t2.a); QUERY PLAN -------------------------------------------------------------------------------------- Append (cost=2.12..9.12 rows=100 width=16) -> Hash Full Join (cost=2.12..4.31 rows=50 width=16) Output: t1.a, t1.b, t2.a, t2.b Hash Cond: (t1.a = t2.a) -> Seq Scan on public.prt1_p1 t1 (cost=0.00..1.50 rows=50 width=8) Output: t1.a, t1.b -> Hash (cost=1.50..1.50 rows=50 width=8) Output: t2.a, t2.b -> Seq Scan on public.prt2_p1 t2 (cost=0.00..1.50 rows=50 width=8) Output: t2.a, t2.b -> Hash Full Join (cost=2.12..4.31 rows=50 width=16) Output: t1_1.a, t1_1.b, t2_1.a, t2_1.b Hash Cond: (t1_1.a = t2_1.a) -> Seq Scan on public.prt1_p2 t1_1 (cost=0.00..1.50 rows=50 width=8) Output: t1_1.a, t1_1.b -> Hash (cost=1.50..1.50 rows=50 width=8) Output: t2_1.a, t2_1.b -> Seq Scan on public.prt2_p2 t2_1 (cost=0.00..1.50 rows=50 width=8) Output: t2_1.a, t2_1.b (19 rows) For this query, the merging would proceed as follow: 1) In the first iteration of the while loop, we have overlap=true, so we merge prt1_p1 and prt2_p1 and move to the next partition on *only the outer side* (prt1_p2) as we have ub_cmpval<0. 2) In the second iteration, we have overlap=false and ub_cmpval>0, so we perform process_inner_partition() to prt2_p1 and move to the next partition on the inner side (prt2_p2). 3) In the third iteration, we have overlap=true, so we merge prt1_p2 and prt2_p2 and move to the next partition on *only the outer side* as we have ub_cmpval<0 (but can't as the outer side is exhausted). 4) In the forth iteration, we have overlap=false and ub_cmpval>0, so we perform process_inner_partition() to prt2_p2 and move to the next partition on the inner side (but can't as the inner side is exhausted). And we are done. We do this in the process_inner_partition() call in the second and forth iterations: /* * In range partitioning, if the given inner partition is already * merged (eg, because we found an overlapping range earlier), we know * where it fits in the join result; nothing to do in that case. Else * create a new merged partition. */ if (inner_map->merged_indexes[inner_index] >= 0) { if (strategy == PARTITION_STRATEGY_LIST) *merged_index = inner_map->merged_indexes[inner_index]; else { Assert(strategy == PARTITION_STRATEGY_RANGE); *merged_index = -1; } } What we do in that function is actually a no-op, so the second and forth iterations are done only to move to the next partition on the inner side, which I think is inefficient. To avoid that, why not move to *the next pair of partitions* in the first and third iterations like the attached? This needs an extra check to see if we can safely move to the next pair of partitions in the first and third iterations, but doesn't need meaningless iterations like the second and fourth iterations in the example. This change also makes the code in process_inner_partition() shown above unnecessary, so I removed it from that function (and process_outer_partition()). * I don't like this: + if ((lb_cmpval < 0 && inner_has_default) || + /* Non-overlapping range on the lower side of outer range. */ + (lb_cmpval > 0 && outer_has_default) || + /* Non-overlapping range on the lower side of inner range. */ + (ub_cmpval < 0 && outer_has_default) || + /* Non-overlapping range on the upper side of inner range. */ + (ub_cmpval > 0 && inner_has_default)) + /* Non-overlapping range on the upper side of outer range. */ + return NULL; because it looks like that eg, the first comment "Non-overlapping range on the lower side of outer range." explains the second condition "(lb_cmpval > 0 && outer_has_default)", which isn't intended, I think. Also, it seems a bit verbose to me to add a comment to each of the four cases. How about simplifying the comments and moving them to above the if block like the attached? * In partition_range_merge_next_lb(), do we really need this bit? + /* + * The lower bound is lower than the last upper bound, thus does not fit + * the bounds created so far and hence can not be merged with the existing + * bounds. + */ + if (cmpval < 0) + return false; I think that if we have such a lower bound, we would error out before calling that function. Also, I think it would be easier to understand to add the upper bound as well within that function. So I modified that function that way (and renamed it). I also added a bit of comments to that function. * Since this is called many times, I changed it to a macro: +static int32 +partition_range_bound_cmp(int partnatts, FmgrInfo *partsupfunc, + Oid *partcollations, PartitionRangeBound *bound1, + PartitionRangeBound *bound2) +{ + return partition_rbound_cmp(partnatts, partsupfunc, partcollations, + bound1->datums, bound1->kind, bound1->lower, + bound2); +} * This is just my taste, but how about renaming partition_range_bound_cmp() and partition_range_cmp() to more common? or readable? names eg, compare_range_bounds() and compare_range_partitions(), respectively? Also, how about renaming partition_range_merge() to eg, get_merged_range_bounds()? Please find attached the delta patch for that. Any feedback welcome! Best regards, Etsuro Fujita
Attachment
pgsql-hackers by date: