Re: [Proposal] Table partition + join pushdown - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [Proposal] Table partition + join pushdown |
Date | |
Msg-id | 20151126.141924.193346332.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [Proposal] Table partition + join pushdown (Taiki Kondo <tai-kondo@yk.jp.nec.com>) |
List | pgsql-hackers |
Hello, sorry for late response and thank you for the new patch. At Fri, 20 Nov 2015 12:05:38 +0000, Taiki Kondo <tai-kondo@yk.jp.nec.com> wrote in <12A9442FBAE80D4E8953883E0B84E08863F115@BPXM01GP.gisp.nec.co.jp> > > I created v3 patch for this feature, and v1 patch for regression tests. > Please find attached. I think I understood what you intend to do by substitute_node_with_join_cond. It replaces *all* vars in check constraint by corresponding expressions containing inner vars. If it is correct, it is predictable whether the check condition can be successfully transformed. Addition to that, substitute_node_with_join_cond uses any side of join clases that matches the target var but it is somewhat different from what exactly should be done, even if it works correctly. For those conditions, substitute_node_with_join_cond and create_rinfo_from_check_constr could be simpler and clearer as following. Also this refactored code determines clearly what the function does, I believe. ==== create_rinfo_from_check_constr(...) { pull_varattnos(check_constr, outer_rel->relid, &chkattnos); replacements = extract_replacements(joininfo, outer_rel->relid,&joinattnos); /* * exit if the join clauses cannot replace all vars in the check * constraint */ if (!bms_is_subset(chkattnos, joinattnos)) return NULL; foreach(lc, check_constr) { result = lappend(result, expression_tree_mutator(...); } ==== The attached patch does this. What do you think about this refactoring? > Reply for your comments is below. > > > Overall comments > > ---------------- > > * I think the enhancement in copyfuncs.c shall be in the separate > > patch; it is more graceful manner. At this moment, here is less > > than 20 Path delivered type definition. It is much easier works > > than entire Plan node support as we did recently. > > (How about other folk's opinion?) > > I also would like to wait for other fork's opinion. > So I don't divide this part from this patch yet. Other fork? It's Me? _copyPathFields is independent from all other part of this patch and it looks to be a generic function. I prefer that such indenpendent features to be separate patches, too. > > At this moment, here is less > > than 20 Path delivered type definition. It is much easier works > > than entire Plan node support as we did recently. > > (How about other folk's opinion?) It should be doable but I don't think we should provide all possible _copyPath*. Curently it looks to be enough to provide the function for at least the Paths listed in try_append_pullup_across_join as shown below and others should not be added if it won't be used for now. T_SeqScan, T_SampleScan, T_IndexScan, T_IndexOnlyScan, T_BitmapHeapScan, T_TidScan, T_Gather I doubt that tid partitioning is used but there's no reason no refuse to support it. By the way, would you add regressions for these other types of path? > > * Can you integrate the attached test cases as regression test? > > It is more generic way, and allows people to detect problems > > if relevant feature gets troubled in the future updates. > > Ok, done. Please find attached. > > > * Naming of "join pushdown" is a bit misleading because other > > component also uses this term, but different purpose. > > I'd like to suggest try_pullup_append_across_join. > > Any ideas from native English speaker? > > Thank you for your suggestion. > > I change its name to "try_append_pullup_accross_join", > which is matched with the order of the previous name. > > However, this change is just temporary. > I also would like to wait for other fork's opinion > for the naming. > > > > Patch review > > ------------ > > > > At try_join_pushdown: > > + /* When specified outer path is not an AppendPath, nothing to do here. */ > > + if (!IsA(outer_rel->cheapest_total_path, AppendPath)) > > + { > > + elog(DEBUG1, "Outer path is not an AppendPath. Do nothing."); > > + return; > > + } > > It checks whether the cheapest_total_path is AppendPath at the head > > of this function. It ought to be a loop to walk on the pathlist of > > RelOptInfo, because multiple path-nodes might be still alive > > but also might not be cheapest_total_path. > > > Ok, done. > > > + switch (inner_rel->cheapest_total_path->pathtype) > > + > > Also, we can construct the new Append node if one of the path-node > > within pathlist of inner_rel are at least supported. > > Done. > But, this change will create nested loop between inner_rel's pathlist > and outer_rel's pathlist. It means that planning time is increased more. > > I think it is adequate to check only for cheapest_total_path > because checking only for cheapest_total_path is implemented in other > parts, like make_join_rel(). > > How about your (and also other people's) opinion? > > > > + if (list_length(inner_rel->ppilist) > 0) > > + { > > + elog(DEBUG1, "ParamPathInfo is already set in inner_rel. Can't pushdown."); > > + return; > > + } > > + > > You may need to introduce why this feature uses ParamPathInfos here. > > It seems to me a good hack to attach additional qualifiers on > > the underlying inner scan node, even if it is not a direct child of > > inner relation. > > However, people may have different opinion. > > Ok, added comment in source. > Please find from attached patch. I suppose that the term 'parameter' used here is strictly defined as condition and information about 'parameterized' paths, which related to restrictions involving another relation. In contrast, the PPI added here contains totally defferent from parameter defined here, since it refers only to the relation, in other words, not a join condition. I think such kind of restrictions should be added to baserestrictinfo in RelOptInfo. The condition derived from constraints can be simply added to it. It doesn't need any additional explanation to do so. Any opinions ? > > +static List * > > +convert_parent_joinclauses_to_child(PlannerInfo *root, List *join_clauses, > > + RelOptInfo *outer_rel) { > > + Index parent_relid = > > + find_childrel_appendrelinfo(root, outer_rel)->parent_relid; > > + List *clauses_parent = get_actual_clauses(join_clauses); > > + List *clauses_child = NIL; > > + ListCell *lc; > > + > > + foreach(lc, clauses_parent) > > + { > > + Node *one_clause_child = (Node *) copyObject(lfirst(lc)); > > + > > + ChangeVarNodes(one_clause_child, parent_relid, outer_rel->relid, 0); > > + clauses_child = lappend(clauses_child, one_clause_child); > > + } > > + > > + return make_restrictinfos_from_actual_clauses(root, clauses_child); > > +} > > > > Is ChangeVarNodes() right routine to replace var-node of parent relation > > by relevant var-node of child relation? > > It may look sufficient, however, nobody can ensure varattno of child > > relation is identical with parent relation's one. > > For example, which attribute number shall be assigned on 'z' here? > > CREATE TABLE tbl_parent(x int); > > CREATE TABLE tbl_child(y int) INHERITS(tbl_parent); > > ALTER TABLE tbl_parent ADD COLUMN z int; > > Maybe you're right, so I agree with you. > I use adjust_appendrel_attrs() instead of ChangeVarNodes() > for this purpose. > > > > --- a/src/backend/optimizer/plan/createplan.c > > +++ b/src/backend/optimizer/plan/createplan.c > > @@ -4230,8 +4230,14 @@ prepare_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, List *pathkeys, > > /* > > * Ignore child members unless they match the rel being > > * sorted. > > + * > > + * If this is called from make_sort_from_pathkeys(), > > + * relids may be NULL. In this case, we must not ignore child > > + * members because inner/outer plan of pushed-down merge join is > > + * always child table. > > */ > > - if (em->em_is_child && > > + if (relids != NULL && > > + em->em_is_child && > > !bms_equal(em->em_relids, relids)) > > continue; > > > > It is a little bit hard to understand why this modification is needed. > > Could you add source code comment that focus on the reason why. > > Ok, added comment in source. > Please find from attached patch. Could you show me an example to execute there with relids == NULL? I don't know why prepare_sort_from_pathkeys is called with such condition with this patch but the modification apparently changes the behavior of this function. I guess we should find another way to get the same result or more general explanation for the validity to do so. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: