Thread: [HACKERS] paths in partitions of a dummy partitioned table
If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the partition relations dummy and thus doesn't set any (dummy) paths in the partition relations. The lack of paths in the partitions means that we can not use partition-wise join while joining this table with some other similarly partitioned table as the partitions do not have any paths that child-joins can use. This means that we can not create partition relations for such a join and thus can not consider the join to be partitioned. This doesn't matter much when the dummy relation is the outer relation, since the resultant join is also dummy. But when the dummy relation is inner relation, the resultant join is not dummy and can be considered to be partitioned with same partitioning scheme as the outer relation to be joined with other similarly partitioned table. Not having paths in the partitions deprives us of this future optimization. Without partition-wise join, not setting dummy paths in the partition relations makes sense, since those do not have any use. But with partition-wise join that changes. If we always mark the partitions dummy, that effort may get wasted if the partitioned table doesn't participate in the partition-wise join. A possible solution would be to set the partitions dummy when only the partitioned table participates in the join, but that happens during make_join_rel(), much after we have set up the simple relations. So, I am not sure, whether that's a good option. But I am open to suggestions. What if we always mark the partition relations of a dummy partitioned table dummy? I tried attached path on a thousand partition table, the planning time increased by 3-5%. That doesn't look that bad for a thousand partitions. Any other options/comments? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hello, At Thu, 6 Jul 2017 21:05:21 +0530, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote in <CAFjFpRd5+zroxY7UMGTR2M=rjBV4aBOCxQg3+1rBmTPLK5mpDg@mail.gmail.com> > If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the > partition relations dummy and thus doesn't set any (dummy) paths in the > partition relations. The lack of paths in the partitions means that we can A parent won't be proven dummy directly but be a dummy rel (means IS_DUMMY_REL(rel) returns true) if no child is attached as the result of constarint exlusion. > not use partition-wise join while joining this table with some other similarly > partitioned table as the partitions do not have any paths that child-joins can > use. This means that we can not create partition relations for such a join and > thus can not consider the join to be partitioned. This doesn't matter much when > the dummy relation is the outer relation, since the resultant join is also > dummy. But when the dummy relation is inner relation, the resultant join is not > dummy and can be considered to be partitioned with same partitioning scheme as > the outer relation to be joined with other similarly partitioned table. Not > having paths in the partitions deprives us of this future optimization. > Without partition-wise join, not setting dummy paths in the partition relations > makes sense, since those do not have any use. But with partition-wise join that > changes. Of course for inner-joins and even for outer-joins, there's no point in considering the children of a dummy parent as a side of a join. For what reason, or in what case does partition-wise join need to consider children of a proven-dummy parent? > If we always mark the partitions dummy, that effort may get wasted if the > partitioned table doesn't participate in the partition-wise join. A possible > solution would be to set the partitions dummy when only the partitioned table > participates in the join, but that happens during make_join_rel(), much after > we have set up the simple relations. So, I am not sure, whether that's a good > option. But I am open to suggestions. > > What if we always mark the partition relations of a dummy partitioned table > dummy? I tried attached path on a thousand partition table, the planning time > increased by 3-5%. That doesn't look that bad for a thousand partitions. > > Any other options/comments? Since I don't understand the meaning of the patch, the comments below are just from a micro-viewpoint. - if (IS_DUMMY_REL(rel)) + if (IS_DUMMY_REL(rel) && !rte->inh) In set_rel_pathlist, if want to exlude the inh==true case from the first if, just inverting the first and second if caluses would be enough, like this. | if (rte->inh) | set_append_rel_pathlist(root, rel, rti, rte); | else if (IS_DUMMY_REL(rel)) | /* We already proved the relation empty, so nothing more to do */ + if (IS_DUMMY_REL(rel)) + set_dummy_rel_pathlist(childrel); This is equivalent of just returning before looking over append_rel_list when rel is a dummy. It doesn't seem to me to add marked-as-dummy children to a dummy parent. (I understand that is the objective of this patch.) # Maybe I'm looking a different version of the source code, or # I'm terriblly misunderstanding something.. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Jul 10, 2017 at 2:59 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, Thanks for the review. >> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the >> partition relations dummy and thus doesn't set any (dummy) paths in the >> partition relations. The lack of paths in the partitions means that we can > > A parent won't be proven dummy directly but be a dummy rel (means > IS_DUMMY_REL(rel) returns true) if no child is attached as the > result of constarint exlusion. In a query like SELECT * FROM parent WHERE 1 = 2; the parent is marked dummy in set_rel_size() 319 set_rel_size(PlannerInfo *root, RelOptInfo *rel, 320 Index rti, RangeTblEntry *rte) 321 { 322 if (rel->reloptkind == RELOPT_BASEREL && 323 relation_excluded_by_constraints(root, rel, rte)) 324 { 325 /* 326 * We proved we don't need to scan the rel via constraint exclusion, 327 * so set up a single dummy path for it. Here we only check this for 328 * regular baserels; if it's an otherrel, CE was already checked in 329 * set_append_rel_size(). 330 * 331 * In this case, we go ahead and set up the relation's path right away 332 * instead of leaving it for set_rel_pathlist to do. This is because 333 * we don't have a convention for marking a rel as dummy except by 334 * assigning a dummy path to it. 335 */ 336 set_dummy_rel_pathlist(rel); 337 } It's in such cases, that a parent is proven dummy without looking at the child relations. > >> not use partition-wise join while joining this table with some other similarly >> partitioned table as the partitions do not have any paths that child-joins can >> use. This means that we can not create partition relations for such a join and >> thus can not consider the join to be partitioned. This doesn't matter much when >> the dummy relation is the outer relation, since the resultant join is also >> dummy. But when the dummy relation is inner relation, the resultant join is not >> dummy and can be considered to be partitioned with same partitioning scheme as >> the outer relation to be joined with other similarly partitioned table. Not >> having paths in the partitions deprives us of this future optimization. >> Without partition-wise join, not setting dummy paths in the partition relations >> makes sense, since those do not have any use. But with partition-wise join that >> changes. > > Of course for inner-joins and even for outer-joins, there's no > point in considering the children of a dummy parent as a side of > a join. For what reason, or in what case does partition-wise join > need to consider children of a proven-dummy parent? Consider a join A LEFT JOIN B LEFT JOIN C where B is proven dummy and A, B and C are all partitioned with the same partitioning scheme. The result of A LEFT JOIN B is same as appending Ak LEFT JOIN Bk where Ak and Bk are matching partitions of A and B resp. When we join this with C, the result would be same as appending Ak LEFT JOIN Bk LEFT JOIN Ck where Ck is partition matching Ak and Bk. So, even though B is proven dummy, we can execute A LEFT JOIN B LEFT JOIN C as a partition-wise join, if we can execute A LEFT JOIN B as a partition-wise join. Does that answer your question? > >> If we always mark the partitions dummy, that effort may get wasted if the >> partitioned table doesn't participate in the partition-wise join. A possible >> solution would be to set the partitions dummy when only the partitioned table >> participates in the join, but that happens during make_join_rel(), much after >> we have set up the simple relations. So, I am not sure, whether that's a good >> option. But I am open to suggestions. >> >> What if we always mark the partition relations of a dummy partitioned table >> dummy? I tried attached path on a thousand partition table, the planning time >> increased by 3-5%. That doesn't look that bad for a thousand partitions. >> >> Any other options/comments? > > Since I don't understand the meaning of the patch, the comments > below are just from a micro-viewpoint. > > - if (IS_DUMMY_REL(rel)) > + if (IS_DUMMY_REL(rel) && !rte->inh) > > In set_rel_pathlist, if want to exlude the inh==true case from > the first if, just inverting the first and second if caluses > would be enough, like this. > > | if (rte->inh) > | set_append_rel_pathlist(root, rel, rti, rte); > | else if (IS_DUMMY_REL(rel)) > | /* We already proved the relation empty, so nothing more to do * Right. Done in the attached patch. > > + if (IS_DUMMY_REL(rel)) > + set_dummy_rel_pathlist(childrel); > > This is equivalent of just returning before looking over > append_rel_list when rel is a dummy. It doesn't seem to me to add > marked-as-dummy children to a dummy parent. (I understand that is > the objective of this patch.) Not really. It sets the dummy path list in child, which won't be done if we return without traversing append_rel_list, esp. when the parent is marked dummy without marking any of its children dummy as explained above. But we could avoid marking a child dummy twice, in case the parent was marked dummy because all of its children were marked dummy. I have fixed this case in the attached patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the > partition relations dummy and thus doesn't set any (dummy) paths in the > partition relations. The lack of paths in the partitions means that we can > not use partition-wise join while joining this table with some other similarly > partitioned table as the partitions do not have any paths that child-joins can > use. This means that we can not create partition relations for such a join and > thus can not consider the join to be partitioned. This doesn't matter much when > the dummy relation is the outer relation, since the resultant join is also > dummy. But when the dummy relation is inner relation, the resultant join is not > dummy and can be considered to be partitioned with same partitioning scheme as > the outer relation to be joined with other similarly partitioned table. Not > having paths in the partitions deprives us of this future optimization. I think it's wrong for any code to be examining the path list for a rel marked dummy, so I would suggest approaching this from a different direction. Given A LEFT JOIN B where Bk is dummy, I suggest constructing the path for (AB)k by taking a path from Ak and applying an appropriate PathTarget. You don't really need a join path at all; a path for the non-dummy input is fine - and, in fact, better, since it will be cheaper to execute. One problem is that it may not produce the correct output columns. (AB) may drop some columns that were being generated by A because they were only needed to perform the join, and it may add additional columns taken from B. But I think that if you take the default PathTarget for (AB) and replace references to columns of B with NULL constants, you should be able to apply the resulting PathTarget to a path for Ak and get a valid path for (AB)k. Maybe there is some reason why that won't work exactly, but I think it is probably more promising to attack the problem from this angle than to do what you propose. Sticking dummy joins into the query plan that are really just projecting out NULLs is not appealing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Aug 25, 2017 at 10:46 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the >> partition relations dummy and thus doesn't set any (dummy) paths in the >> partition relations. The lack of paths in the partitions means that we can >> not use partition-wise join while joining this table with some other similarly >> partitioned table as the partitions do not have any paths that child-joins can >> use. This means that we can not create partition relations for such a join and >> thus can not consider the join to be partitioned. This doesn't matter much when >> the dummy relation is the outer relation, since the resultant join is also >> dummy. But when the dummy relation is inner relation, the resultant join is not >> dummy and can be considered to be partitioned with same partitioning scheme as >> the outer relation to be joined with other similarly partitioned table. Not >> having paths in the partitions deprives us of this future optimization. > > I think it's wrong for any code to be examining the path list for a > rel marked dummy, so I would suggest approaching this from a different > direction. Me and Robert had an offline discussion about this. I am summarizing it here for the sake of completeness. A dummy relation is identified by the only dummy path that exists in its pathlist. There is no flag in RelOptInfo which tells whether a given relation is dummy or not, it's the dummy path which tells that. A dummy path is an Append path with no subpaths. Planner doesn't treat dummy relations any different from other relations when it comes to using paths. When a dummy relation participates in a join, the dummy path is used as one of the joining paths and converted to a Result plan at the time of planning. So, for a partition-wise join where one of the joining relations is dummy, its every child must have dummy path which can be used to construct child-join paths. But we don't need to mark partition relations dummy (if their parent is dummy) even when it's not going to participate in partition-wise join. The partition relations will be marked dummy when we know that they will be required for partition-wise join. I was worried that we might mark base relation dummy during join planning this way, but we already have a precedence for that in add_paths_to_join_rel(). So, shouldn't be a problem. So, I have now added a patch in partition-wise join set to mark partition relations dummy when their parent is dummy. > Given A LEFT JOIN B where Bk is dummy, I suggest > constructing the path for (AB)k by taking a path from Ak and applying > an appropriate PathTarget. You don't really need a join path at all; > a path for the non-dummy input is fine - and, in fact, better, since > it will be cheaper to execute. One problem is that it may not produce > the correct output columns. (AB) may drop some columns that were > being generated by A because they were only needed to perform the > join, and it may add additional columns taken from B. But I think > that if you take the default PathTarget for (AB) and replace > references to columns of B with NULL constants, you should be able to > apply the resulting PathTarget to a path for Ak and get a valid path > for (AB)k. Maybe there is some reason why that won't work exactly, > but I think it is probably more promising to attack the problem from > this angle than to do what you propose. Sticking dummy joins into the > query plan that are really just projecting out NULLs is not appealing. > This might help in the cases when the RelOptInfo itself is missing e.g. missing partitions in partition matching as discussed in [1]. I will discuss this approach on that thread. [1] https://www.postgresql.org/message-id/CAFjFpRdjQvaUEV5DJX3TW6pU5eq54NCkadtxHX2JiJG_GvbrCA@mail.gmail.com -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company