Thread: Optimizer hook
Attached is the patch for the optimizer hook as discussed in
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00362.php
The hook declaration and definition to the same places as the planner_hook
is located since the optimizer_hook relates to the part of the planner and
I find it good to have the planner related hooks on one place.
The optimizer_hook is called in make_one_rel and the plugin replaces
the function 'make_rel_from_joinlist'. I have created a macro
standard_optimizer just for better readability of the code in plugins
and I did not want to change function names.
I also tried to dig out the dynamic programming optimizer and to compile
it and use it as a plugin just to be sure that all the functions necessary
to write the optimizer as a plugin are non-static. I placed all the code
in joinrels.c, make_one_rel_by_joins and make_rel_from_joinlist functions.
Of cause, I renamed all non-static symbols.
Sure, I ran 'make check' without failures.
Is there anything else I should take care? Thanks.
Regards
Julo Stroffek
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00362.php
The hook declaration and definition to the same places as the planner_hook
is located since the optimizer_hook relates to the part of the planner and
I find it good to have the planner related hooks on one place.
The optimizer_hook is called in make_one_rel and the plugin replaces
the function 'make_rel_from_joinlist'. I have created a macro
standard_optimizer just for better readability of the code in plugins
and I did not want to change function names.
I also tried to dig out the dynamic programming optimizer and to compile
it and use it as a plugin just to be sure that all the functions necessary
to write the optimizer as a plugin are non-static. I placed all the code
in joinrels.c, make_one_rel_by_joins and make_rel_from_joinlist functions.
Of cause, I renamed all non-static symbols.
Sure, I ran 'make check' without failures.
Is there anything else I should take care? Thanks.
Regards
Julo Stroffek
I rewrote a patch a bit.
I made function make_one_rel_by_joins also non-static,
so that the optimizer in plugin can reuse the dynamic
optimizer.
I also reorganized the code and put all the changes
to paths.h and allpaths.c files only. It seems to me more
appropriate that the hook should belong to this place only.
Thanks
Julo
Julius Stroffek wrote: Attached is the patch for the optimizer hook as discussed in
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00362.php
The hook declaration and definition to the same places as the planner_hook
is located since the optimizer_hook relates to the part of the planner and
I find it good to have the planner related hooks on one place.
The optimizer_hook is called in make_one_rel and the plugin replaces
the function 'make_rel_from_joinlist'. I have created a macro
standard_optimizer just for better readability of the code in plugins
and I did not want to change function names.
I also tried to dig out the dynamic programming optimizer and to compile
it and use it as a plugin just to be sure that all the functions necessary
to write the optimizer as a plugin are non-static. I placed all the code
in joinrels.c, make_one_rel_by_joins and make_rel_from_joinlist functions.
Of cause, I renamed all non-static symbols.
Sure, I ran 'make check' without failures.
Is there anything else I should take care? Thanks.
Regards
Julo Stroffek
I made function make_one_rel_by_joins also non-static,
so that the optimizer in plugin can reuse the dynamic
optimizer.
I also reorganized the code and put all the changes
to paths.h and allpaths.c files only. It seems to me more
appropriate that the hook should belong to this place only.
Thanks
Julo
Julius Stroffek wrote: Attached is the patch for the optimizer hook as discussed in
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00362.php
The hook declaration and definition to the same places as the planner_hook
is located since the optimizer_hook relates to the part of the planner and
I find it good to have the planner related hooks on one place.
The optimizer_hook is called in make_one_rel and the plugin replaces
the function 'make_rel_from_joinlist'. I have created a macro
standard_optimizer just for better readability of the code in plugins
and I did not want to change function names.
I also tried to dig out the dynamic programming optimizer and to compile
it and use it as a plugin just to be sure that all the functions necessary
to write the optimizer as a plugin are non-static. I placed all the code
in joinrels.c, make_one_rel_by_joins and make_rel_from_joinlist functions.
Of cause, I renamed all non-static symbols.
Sure, I ran 'make check' without failures.
Is there anything else I should take care? Thanks.
Regards
Julo Stroffek
*** ./src/backend/optimizer/path/allpaths.c.orig Sat May 26 20:23:01 2007 --- ./src/backend/optimizer/path/allpaths.c Tue Aug 14 17:26:39 2007 *************** *** 52,58 **** RangeTblEntry *rte); static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); ! static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); static RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, --- 52,58 ---- RangeTblEntry *rte); static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); ! RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); static RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, *************** *** 87,93 **** /* * Generate access paths for the entire join tree. */ ! rel = make_rel_from_joinlist(root, joinlist); /* * The result should join all and only the query's base rels. --- 87,96 ---- /* * Generate access paths for the entire join tree. */ ! if (optimizer_hook) ! rel = optimizer_hook(root, joinlist); ! else ! rel = make_rel_from_joinlist(root, joinlist); /* * The result should join all and only the query's base rels. *************** *** 612,618 **** * See comments for deconstruct_jointree() for definition of the joinlist * data structure. */ ! static RelOptInfo * make_rel_from_joinlist(PlannerInfo *root, List *joinlist) { int levels_needed; --- 615,621 ---- * See comments for deconstruct_jointree() for definition of the joinlist * data structure. */ ! RelOptInfo * make_rel_from_joinlist(PlannerInfo *root, List *joinlist) { int levels_needed; *** ./src/backend/optimizer/plan/planner.c.orig Sat May 26 20:23:01 2007 --- ./src/backend/optimizer/plan/planner.c Wed Aug 15 15:31:21 2007 *************** *** 45,50 **** --- 45,52 ---- /* Hook for plugins to get control in planner() */ planner_hook_type planner_hook = NULL; + /* Hook for plugins to get control in make_one_rel() */ + optimizer_hook_type optimizer_hook = NULL; /* Expression kind codes for preprocess_expression */ #define EXPRKIND_QUAL 0 *** ./src/include/optimizer/planner.h.orig Wed Jul 25 14:22:53 2007 --- ./src/include/optimizer/planner.h Wed Aug 15 11:17:00 2007 *************** *** 17,22 **** --- 17,24 ---- #include "nodes/plannodes.h" #include "nodes/relation.h" + /* A macro pointing to the standard optimizer function. */ + #define standard_optimizer make_rel_from_joinlist /* Hook for plugins to get control in planner() */ typedef PlannedStmt * (*planner_hook_type) (Query *parse, *************** *** 24,29 **** --- 26,35 ---- ParamListInfo boundParams); extern PGDLLIMPORT planner_hook_type planner_hook; + /* Hook for plugins to get control in make_one_rel() */ + typedef RelOptInfo * (*optimizer_hook_type) (PlannerInfo * root, + List * joinlist); + extern PGDLLIMPORT optimizer_hook_type optimizer_hook; extern PlannedStmt *planner(Query *parse, int cursorOptions, ParamListInfo boundParams);
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Julius Stroffek <Julius.Stroffek@Sun.COM> writes: > I rewrote a patch a bit. This hook seems very strangely defined to me. Why did you not put the hook at the point where the current geqo-vs-regular decision is made? I do not see the value of duplicating the joinlist-expansion logic, which is what you're going to have to do if you use a hook that replaces make_rel_from_joinlist. The only things you could do differently than it does would be (1) be smarter about full outer joins, which was not part of the agenda that I heard of, or (2) willfully disregard the user's from_collapse_limit and join_collapse_limit settings, which doesn't seem like an amazingly good idea either. Also, "optimizer_hook" seems nearly content-free as a name for use in this area; I see no reason why the particular sub-section of the planner we're discussing here has more title to that name than other parts. Something like "join_order_search_hook" might be more appropriate. regards, tom lane
> This hook seems very strangely defined to me. Why did you not put the > hook at the point where the current geqo-vs-regular decision is made? > The reason is that I thought about gaining a control over the algorithm used to solve individual subproblems in make_rel_from_joinlist. If we would have a couple of algorithms to be tested we can implement a plugin using join_order_search_hook which will compare the results of those algorithms. Doing the same at the place where geqo/regular code is called might make thinks a bit more difficult. Later on, we could also implement a code trying to run some "very fast" algorithm at first (for the master problem and all subproblems) to get some estimates and decide whether it makes sense to try to find a better plan in longer time. > Also, "optimizer_hook" seems nearly content-free as a name for use > in this area; I see no reason why the particular sub-section of the > planner we're discussing here has more title to that name than other > parts. Something like "join_order_search_hook" might be more > appropriate. > I completely agree and I have renamed the hook. The new version V3 of the patch is attached. Thanks for the comments. Regards Julius Stroffek *** ./src/backend/optimizer/path/allpaths.c.orig Sat May 26 20:23:01 2007 --- ./src/backend/optimizer/path/allpaths.c Tue Sep 25 11:37:19 2007 *************** *** 37,43 **** --- 37,45 ---- bool enable_geqo = false; /* just in case GUC doesn't set it */ int geqo_threshold; + join_order_search_hook_type join_order_search_hook = NULL; + static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); *************** *** 52,60 **** RangeTblEntry *rte); static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); - static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); - static RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, int levels_needed, - List *initial_rels); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, bool *differentTypes); static bool recurse_pushdown_safe(Node *setOp, Query *topquery, --- 54,59 ---- *************** *** 87,93 **** /* * Generate access paths for the entire join tree. */ ! rel = make_rel_from_joinlist(root, joinlist); /* * The result should join all and only the query's base rels. --- 86,95 ---- /* * Generate access paths for the entire join tree. */ ! if (join_order_search_hook) ! rel = join_order_search_hook(root, joinlist); ! else ! rel = make_rel_from_joinlist(root, joinlist); /* * The result should join all and only the query's base rels. *************** *** 612,618 **** * See comments for deconstruct_jointree() for definition of the joinlist * data structure. */ ! static RelOptInfo * make_rel_from_joinlist(PlannerInfo *root, List *joinlist) { int levels_needed; --- 614,620 ---- * See comments for deconstruct_jointree() for definition of the joinlist * data structure. */ ! RelOptInfo * make_rel_from_joinlist(PlannerInfo *root, List *joinlist) { int levels_needed; *************** *** 695,701 **** * Returns the final level of join relations, i.e., the relation that is * the result of joining all the original relations together. */ ! static RelOptInfo * make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels) { List **joinitems; --- 697,703 ---- * Returns the final level of join relations, i.e., the relation that is * the result of joining all the original relations together. */ ! RelOptInfo * make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels) { List **joinitems; *** ./src/include/optimizer/paths.h.orig Tue May 22 03:40:33 2007 --- ./src/include/optimizer/paths.h Tue Sep 25 11:32:34 2007 *************** *** 23,30 **** extern bool enable_geqo; extern int geqo_threshold; ! extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist); #ifdef OPTIMIZER_DEBUG extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel); #endif --- 23,46 ---- extern bool enable_geqo; extern int geqo_threshold; ! /* A macro pointing to the standard joinorder search function. */ ! #define standard_join_order_search make_rel_from_joinlist + /* Hook for plugins to get control in make_one_rel() */ + typedef RelOptInfo * (*join_order_search_hook_type) (PlannerInfo * root, + List * joinlist); + extern PGDLLIMPORT join_order_search_hook_type join_order_search_hook; + + /* + * Functions related to searching the space + * of all possible join orders. + */ + extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist); + extern RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, + List *joinlist); + extern RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, + int levels_needed, + List *initial_rels); #ifdef OPTIMIZER_DEBUG extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel); #endif
Julius Stroffek <Julius.Stroffek@Sun.COM> writes: >> This hook seems very strangely defined to me. Why did you not put the >> hook at the point where the current geqo-vs-regular decision is made? > > The reason is that I thought about gaining a control over the algorithm > used to solve individual subproblems in make_rel_from_joinlist. That would be an entirely different problem, I think. If you want to replace the *entire* planner, there's already the planner_hook to do that. If you're just looking to change the join order search method, the place where GEQO hooks in is the most convenient place for that. It's not clear to me what would be usefully accomplished by putting the hook somewhere else; you can't just willy-nilly replace the code for single-relation path selection, at least not without a whole lot of changes elsewhere in the planner infrastructure. My thought was that the reason for this hook to exist was simply to provide a convenient way to replace only the join search order algorithm. I'm willing to entertain the thought of other hooks in other places for other specific purposes, but where you want to put it seems not well-matched to any problem that would be likely to be solved without replacing all of the planner. regards, tom lane
Well, It seems that I probably do not understand something quite well or my explanation is not clear. There is some example code of my idea in the attachment. There is a function jos_search which has nearly the same code as make_rel_from_joinlist. The example tries geqo first and then the regular algorithm. At the first sight it seems that it is possible to do the same with the hook placed to the current decision point. However, the execution flow is slightly different. 1.) example in dummy.c --- It first uses geqo only to find the cheapest path also for all the recursive calls in jos_search. Afterwards, the same is executed using the regular algorithm also for all the recursive calls in jos_search. The whole result path for the query will be produced only by either geqo or regular algorithm. 2.) placing the hook to the current decision point and trying both geqo®ular at that place. --- Parts of the result path might be found by geqo and parts of it by regular algorithm. The problem here is that regular algorithm will find the best plan every time it finishes. However, the above is supposed to be used for the algorithms that none of them is supposed to find the best solution. > That would be an entirely different problem, I think. If you want to > replace the *entire* planner, there's already the planner_hook to do that. > Replacing the whole planner would need a much more code to be reused without any change than in this case. > you can't just willy-nilly replace the code for > single-relation path selection, at least not without a whole lot of > changes elsewhere in the planner infrastructure. > Would the code in dummy.c work in a way that I expect and explained above? If there is no way of how to make the code work then it makes no sense to put the hook to the place I am proposing. It works for me, but I have not tested that very well yet. If I would swap calls to geqo and make_one_rel_by_joins it will not work. Therefore there might be an issue I do not know about yet. > My thought was that the reason for this hook to exist was simply to > provide a convenient way to replace only the join search order > algorithm. Yes, thats true. I do not have a plan to do something more for now. Thank you Regards Julius Stroffek #include "postgres.h" #include "fmgr.h" #include "optimizer/geqo.h" #include "optimizer/paths.h" #include "optimizer/pathnode.h" PG_MODULE_MAGIC; void _PG_init(void); void _PG_fini(void); typedef RelOptInfo * (*jos_function_type) (PlannerInfo *root, int levels_needed, List* initial_rels); static RelOptInfo * jos_search(PlannerInfo *root, List *joinlist, jos_function_type jos_function) { int levels_needed; List *initial_rels; ListCell *jl; /* * Count the number of child joinlist nodes. This is the depth of the * dynamic-programming algorithm we must employ to consider all ways of * joining the child nodes. */ levels_needed = list_length(joinlist); if (levels_needed <= 0) return NULL; /* nothing to do? */ /* * Construct a list of rels corresponding to the child joinlist nodes. * This may contain both base rels and rels constructed according to * sub-joinlists. */ initial_rels = NIL; foreach(jl, joinlist) { Node *jlnode = (Node *) lfirst(jl); RelOptInfo *thisrel; if (IsA(jlnode, RangeTblRef)) { int varno = ((RangeTblRef *) jlnode)->rtindex; thisrel = find_base_rel(root, varno); } else if (IsA(jlnode, List)) { /* Recurse to handle subproblem */ thisrel = jos_search(root, (List *) jlnode, jos_function); } else { elog(ERROR, "unrecognized joinlist node type: %d", (int) nodeTag(jlnode)); thisrel = NULL; /* keep compiler quiet */ } initial_rels = lappend(initial_rels, thisrel); } if (levels_needed == 1) { /* * Single joinlist node, so we're done. */ return (RelOptInfo *) linitial(initial_rels); } else { jos_function(root, levels_needed, initial_rels); } } static RelOptInfo * jos_dummy(PlannerInfo *root, List *joinlist) { RelOptInfo *dynamic_result, *geqo_result; List *copy; Cost dynamic_cost, geqo_cost; copy = list_copy(joinlist); elog(LOG, "Starting a join order search \"geqo\"..."); geqo_result = jos_search(root, copy, geqo); // geqo_result = jos_search(root, copy, make_one_rel_by_joins); geqo_cost = geqo_result->cheapest_total_path->total_cost; copy = list_copy(joinlist); elog(LOG, "Starting a join order search \"dynamic\"..."); dynamic_result = jos_search(root, copy, make_one_rel_by_joins); // dynamic_result = jos_search(root, copy, geqo); dynamic_cost = dynamic_result->cheapest_total_path->total_cost; printf("GEQO cost: %f\n", geqo_cost); printf("Dynamic programming cost: %f\n", dynamic_cost); if (geqo_cost < dynamic_cost) return geqo_result; else return dynamic_result; } void _PG_init(void) { join_order_search_hook = jos_dummy; } void _PG_fini(void) { join_order_search_hook = 0; }
Julius Stroffek <Julius.Stroffek@Sun.COM> writes: > Parts of the result path might be found by geqo and parts of it by > regular algorithm. Why would you care? Seems like forcing that to not happen is actively making it stupider. > If there is no way of how to make the code work then it makes no sense > to put the hook to the place I am proposing. It works for me, but I have > not tested that very well yet. If I would swap calls to geqo > and make_one_rel_by_joins it will not work. Therefore there might be > an issue I do not know about yet. Well, I can see one likely problem: list_copy is a shallow copy and thus doesn't ensure that the second set of functions sees the same input data structures as the first. I know that geqo has to go through some special pushups to perform multiple invocations of the base planner, and I suspect you need that here too. Look at geqo_eval(). regards, tom lane
> Why would you care? Seems like forcing that to not happen is actively > making it stupider. > To better compare the algorithms and possibly not for final solution at the beginning. If we would implement 10 algorithms and want to pickup just 3 best ones to be used and throw 7 away. Later on, we can try to run just the one "very fast" algorithm and depending on the cost decide whether we would run remaining 9 or less or even none. Yes, the example in dummy.c is really stupider, but it could be done in more clever way. > Well, I can see one likely problem: list_copy is a shallow copy and > thus doesn't ensure that the second set of functions sees the same input > data structures as the first. I know that geqo has to go through some > special pushups to perform multiple invocations of the base planner, > and I suspect you need that here too. Look at geqo_eval(). I'll explore that. Thanks Regards Julius Stroffek
Julius Stroffek <Julius.Stroffek@Sun.COM> writes: >> Why would you care? Seems like forcing that to not happen is actively >> making it stupider. >> > To better compare the algorithms and possibly not for final solution at > the beginning. If we would implement 10 algorithms and want to pickup > just 3 best ones to be used and throw 7 away. Well, you could in any case force the same decision to be made in every invocation, for example by driving it off a GUC variable. The idea you have here seems to be "it'll be the same choice in every sub-problem, only we won't know which one afterwards", which doesn't seem at all helpful for planner testing purposes. > Yes, the example in dummy.c is really stupider, but it could be done > in more clever way. I think dummy.c kinda proves my point: more than half the code is accomplishing nothing except to duplicate the behavior of make_rel_from_joinlist. regards, tom lane
I've applied this patch with revision to put the hook where I thought it made sense. Attached is a modification of your dummy.c to show use of the hook. I didn't test it heavily, but I did check that it seemed to work with either order of calling geqo() and standard_join_search(). regards, tom lane #include "postgres.h" #include "fmgr.h" #include "optimizer/geqo.h" #include "optimizer/paths.h" #include "optimizer/pathnode.h" PG_MODULE_MAGIC; void _PG_init(void); void _PG_fini(void); static RelOptInfo * my_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) { RelOptInfo *dynamic_result, *geqo_result; Cost dynamic_cost, geqo_cost; List *dynamic_list, *geqo_list; struct HTAB *dynamic_hash, *geqo_hash; int savelength; savelength = list_length(root->join_rel_list); root->join_rel_hash = NULL; elog(LOG, "Starting a join order search \"dynamic\"..."); dynamic_result = standard_join_search(root, levels_needed, initial_rels); dynamic_cost = dynamic_result->cheapest_total_path->total_cost; dynamic_list = list_copy(root->join_rel_list); dynamic_hash = root->join_rel_hash; root->join_rel_list = list_truncate(root->join_rel_list, savelength); root->join_rel_hash = NULL; elog(LOG, "Starting a join order search \"geqo\"..."); geqo_result = geqo(root, levels_needed, initial_rels); geqo_cost = geqo_result->cheapest_total_path->total_cost; geqo_list = list_copy(root->join_rel_list); geqo_hash = root->join_rel_hash; fprintf(stderr, "GEQO cost: %f Dynamic programming cost: %f\n", geqo_cost, dynamic_cost); if (geqo_cost < dynamic_cost) { root->join_rel_list = geqo_list; root->join_rel_hash = geqo_hash; return geqo_result; } else { root->join_rel_list = dynamic_list; root->join_rel_hash = dynamic_hash; return dynamic_result; } } void _PG_init(void) { join_search_hook = my_join_search; } void _PG_fini(void) { join_search_hook = NULL; }