Thread: Optimizer hook

Optimizer hook

From
Julius Stroffek
Date:
Attached is the patch for the optimizer hook as discussed in
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00362.php

The hook declaration and definition to the same places as the planner_hook
is located since the optimizer_hook relates to the part of the planner and
I find it good to have the planner related hooks on one place.

The optimizer_hook is called in make_one_rel and the plugin replaces
the function 'make_rel_from_joinlist'. I have created a macro
standard_optimizer just for better readability of the code in plugins
and I did not want to change function names.

I also tried to dig out the dynamic programming optimizer and to compile
it and use it as a plugin just to be sure that all the functions necessary
to write the optimizer as a plugin are non-static. I placed all the code
in joinrels.c, make_one_rel_by_joins and make_rel_from_joinlist functions.
Of cause, I renamed all non-static symbols.

Sure, I ran 'make check' without failures.

Is there anything else I should take care? Thanks.

Regards

Julo Stroffek

Re: Optimizer hook

From
Julius Stroffek
Date:
I rewrote a patch a bit.

I made function make_one_rel_by_joins also non-static,
so that the optimizer in plugin can reuse the dynamic
optimizer.

I also reorganized the code and put all the changes
to paths.h and allpaths.c files only. It seems to me more
appropriate that the hook should belong to this place only.

Thanks

Julo

Julius Stroffek wrote:
Attached is the patch for the optimizer hook as discussed in
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00362.php

The hook declaration and definition to the same places as the planner_hook
is located since the optimizer_hook relates to the part of the planner and
I find it good to have the planner related hooks on one place.

The optimizer_hook is called in make_one_rel and the plugin replaces
the function 'make_rel_from_joinlist'. I have created a macro
standard_optimizer just for better readability of the code in plugins
and I did not want to change function names.

I also tried to dig out the dynamic programming optimizer and to compile
it and use it as a plugin just to be sure that all the functions necessary
to write the optimizer as a plugin are non-static. I placed all the code
in joinrels.c, make_one_rel_by_joins and make_rel_from_joinlist functions.
Of cause, I renamed all non-static symbols.

Sure, I ran 'make check' without failures.

Is there anything else I should take care? Thanks.

Regards

Julo Stroffek


*** ./src/backend/optimizer/path/allpaths.c.orig Sat May 26 20:23:01 2007 --- ./src/backend/optimizer/path/allpaths.c Tue Aug 14 17:26:39 2007 *************** *** 52,58 **** RangeTblEntry *rte); static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); ! static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); static RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, --- 52,58 ---- RangeTblEntry *rte); static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte); ! RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist); static RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels); static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery, *************** *** 87,93 **** /* * Generate access paths for the entire join tree. */ ! rel = make_rel_from_joinlist(root, joinlist); /* * The result should join all and only the query's base rels. --- 87,96 ---- /* * Generate access paths for the entire join tree. */ ! if (optimizer_hook) ! rel = optimizer_hook(root, joinlist); ! else ! rel = make_rel_from_joinlist(root, joinlist); /* * The result should join all and only the query's base rels. *************** *** 612,618 **** * See comments for deconstruct_jointree() for definition of the joinlist * data structure. */ ! static RelOptInfo * make_rel_from_joinlist(PlannerInfo *root, List *joinlist) { int levels_needed; --- 615,621 ---- * See comments for deconstruct_jointree() for definition of the joinlist * data structure. */ ! RelOptInfo * make_rel_from_joinlist(PlannerInfo *root, List *joinlist) { int levels_needed; *** ./src/backend/optimizer/plan/planner.c.orig Sat May 26 20:23:01 2007 --- ./src/backend/optimizer/plan/planner.c Wed Aug 15 15:31:21 2007 *************** *** 45,50 **** --- 45,52 ---- /* Hook for plugins to get control in planner() */ planner_hook_type planner_hook = NULL; + /* Hook for plugins to get control in make_one_rel() */ + optimizer_hook_type optimizer_hook = NULL; /* Expression kind codes for preprocess_expression */ #define EXPRKIND_QUAL 0 *** ./src/include/optimizer/planner.h.orig Wed Jul 25 14:22:53 2007 --- ./src/include/optimizer/planner.h Wed Aug 15 11:17:00 2007 *************** *** 17,22 **** --- 17,24 ---- #include "nodes/plannodes.h" #include "nodes/relation.h" + /* A macro pointing to the standard optimizer function. */ + #define standard_optimizer make_rel_from_joinlist /* Hook for plugins to get control in planner() */ typedef PlannedStmt * (*planner_hook_type) (Query *parse, *************** *** 24,29 **** --- 26,35 ---- ParamListInfo boundParams); extern PGDLLIMPORT planner_hook_type planner_hook; + /* Hook for plugins to get control in make_one_rel() */ + typedef RelOptInfo * (*optimizer_hook_type) (PlannerInfo * root, + List * joinlist); + extern PGDLLIMPORT optimizer_hook_type optimizer_hook; extern PlannedStmt *planner(Query *parse, int cursorOptions, ParamListInfo boundParams);

---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org

Re: Optimizer hook

From
Tom Lane
Date:
Julius Stroffek <Julius.Stroffek@Sun.COM> writes:
> I rewrote a patch a bit.

This hook seems very strangely defined to me.  Why did you not put the
hook at the point where the current geqo-vs-regular decision is made?
I do not see the value of duplicating the joinlist-expansion logic,
which is what you're going to have to do if you use a hook that replaces
make_rel_from_joinlist.  The only things you could do differently than
it does would be (1) be smarter about full outer joins, which was not
part of the agenda that I heard of, or (2) willfully disregard the
user's from_collapse_limit and join_collapse_limit settings, which
doesn't seem like an amazingly good idea either.

Also, "optimizer_hook" seems nearly content-free as a name for use
in this area; I see no reason why the particular sub-section of the
planner we're discussing here has more title to that name than other
parts.  Something like "join_order_search_hook" might be more
appropriate.

            regards, tom lane

Re: Optimizer hook

From
Julius Stroffek
Date:
> This hook seems very strangely defined to me.  Why did you not put the
> hook at the point where the current geqo-vs-regular decision is made?
>
The reason is that I thought about gaining a control over the algorithm
used to solve individual subproblems in make_rel_from_joinlist. If we would
have a couple of algorithms to be tested we can implement a plugin
using join_order_search_hook which will compare the results of
those algorithms. Doing the same at the place where geqo/regular code is
called might make thinks a bit more difficult.

Later on, we could also implement a code trying to run some "very fast"
algorithm at first (for the master problem and all subproblems) to get
some estimates and decide whether it makes sense to try to find a better
plan in longer time.

> Also, "optimizer_hook" seems nearly content-free as a name for use
> in this area; I see no reason why the particular sub-section of the
> planner we're discussing here has more title to that name than other
> parts.  Something like "join_order_search_hook" might be more
> appropriate.
>
I completely agree and I have renamed the hook.

The new version V3 of the patch is attached.

Thanks for the comments.

Regards

Julius Stroffek
*** ./src/backend/optimizer/path/allpaths.c.orig    Sat May 26 20:23:01 2007
--- ./src/backend/optimizer/path/allpaths.c    Tue Sep 25 11:37:19 2007
***************
*** 37,43 ****
--- 37,45 ----
  bool        enable_geqo = false;    /* just in case GUC doesn't set it */
  int            geqo_threshold;

+ join_order_search_hook_type join_order_search_hook = NULL;

+
  static void set_base_rel_pathlists(PlannerInfo *root);
  static void set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
                               Index rti, RangeTblEntry *rte);
***************
*** 52,60 ****
                        RangeTblEntry *rte);
  static void set_values_pathlist(PlannerInfo *root, RelOptInfo *rel,
                      RangeTblEntry *rte);
- static RelOptInfo *make_rel_from_joinlist(PlannerInfo *root, List *joinlist);
- static RelOptInfo *make_one_rel_by_joins(PlannerInfo *root, int levels_needed,
-                       List *initial_rels);
  static bool subquery_is_pushdown_safe(Query *subquery, Query *topquery,
                            bool *differentTypes);
  static bool recurse_pushdown_safe(Node *setOp, Query *topquery,
--- 54,59 ----
***************
*** 87,93 ****
      /*
       * Generate access paths for the entire join tree.
       */
!     rel = make_rel_from_joinlist(root, joinlist);

      /*
       * The result should join all and only the query's base rels.
--- 86,95 ----
      /*
       * Generate access paths for the entire join tree.
       */
!     if (join_order_search_hook)
!         rel = join_order_search_hook(root, joinlist);
!     else
!         rel = make_rel_from_joinlist(root, joinlist);

      /*
       * The result should join all and only the query's base rels.
***************
*** 612,618 ****
   * See comments for deconstruct_jointree() for definition of the joinlist
   * data structure.
   */
! static RelOptInfo *
  make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
  {
      int            levels_needed;
--- 614,620 ----
   * See comments for deconstruct_jointree() for definition of the joinlist
   * data structure.
   */
! RelOptInfo *
  make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
  {
      int            levels_needed;
***************
*** 695,701 ****
   * Returns the final level of join relations, i.e., the relation that is
   * the result of joining all the original relations together.
   */
! static RelOptInfo *
  make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels)
  {
      List      **joinitems;
--- 697,703 ----
   * Returns the final level of join relations, i.e., the relation that is
   * the result of joining all the original relations together.
   */
! RelOptInfo *
  make_one_rel_by_joins(PlannerInfo *root, int levels_needed, List *initial_rels)
  {
      List      **joinitems;
*** ./src/include/optimizer/paths.h.orig    Tue May 22 03:40:33 2007
--- ./src/include/optimizer/paths.h    Tue Sep 25 11:32:34 2007
***************
*** 23,30 ****
  extern bool enable_geqo;
  extern int    geqo_threshold;

! extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);

  #ifdef OPTIMIZER_DEBUG
  extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
  #endif
--- 23,46 ----
  extern bool enable_geqo;
  extern int    geqo_threshold;

! /* A macro pointing to the standard joinorder search function. */
! #define standard_join_order_search make_rel_from_joinlist

+ /* Hook for plugins to get control in make_one_rel() */
+ typedef RelOptInfo * (*join_order_search_hook_type) (PlannerInfo * root,
+                                              List * joinlist);
+ extern PGDLLIMPORT join_order_search_hook_type join_order_search_hook;
+
+ /*
+  * Functions related to searching the space
+  * of all possible join orders.
+  */
+ extern RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist);
+ extern RelOptInfo *make_rel_from_joinlist(PlannerInfo *root,
+                                           List *joinlist);
+ extern RelOptInfo *make_one_rel_by_joins(PlannerInfo *root,
+                                          int levels_needed,
+                                          List *initial_rels);
  #ifdef OPTIMIZER_DEBUG
  extern void debug_print_rel(PlannerInfo *root, RelOptInfo *rel);
  #endif

Re: Optimizer hook

From
Tom Lane
Date:
Julius Stroffek <Julius.Stroffek@Sun.COM> writes:
>> This hook seems very strangely defined to me.  Why did you not put the
>> hook at the point where the current geqo-vs-regular decision is made?
>
> The reason is that I thought about gaining a control over the algorithm
> used to solve individual subproblems in make_rel_from_joinlist.

That would be an entirely different problem, I think.  If you want to
replace the *entire* planner, there's already the planner_hook to do that.
If you're just looking to change the join order search method, the place
where GEQO hooks in is the most convenient place for that.  It's not
clear to me what would be usefully accomplished by putting the hook
somewhere else; you can't just willy-nilly replace the code for
single-relation path selection, at least not without a whole lot of
changes elsewhere in the planner infrastructure.

My thought was that the reason for this hook to exist was simply to
provide a convenient way to replace only the join search order
algorithm.  I'm willing to entertain the thought of other hooks in other
places for other specific purposes, but where you want to put it seems
not well-matched to any problem that would be likely to be solved
without replacing all of the planner.

            regards, tom lane

Re: Optimizer hook

From
Julius Stroffek
Date:
Well, It seems that I probably do not understand something quite well or
my explanation
is not clear. There is some example code of my idea in the attachment.

There is a function jos_search which has nearly the same code as
make_rel_from_joinlist.
The example tries geqo first and then the regular algorithm. At the
first sight it seems that
it is possible to do the same with the hook placed to the current
decision point. However,
the execution flow is slightly different.

1.) example in dummy.c
---
It first uses geqo only to find the cheapest path also for all the recursive
calls in jos_search. Afterwards, the same is executed using the regular
algorithm
also for all the recursive calls in jos_search. The whole result path
for the query
will be produced only by either geqo or regular algorithm.

2.) placing the hook to the current decision point and trying both
geqo®ular
at that place.
---
Parts of the result path might be found by geqo and parts of it by
regular algorithm.

The problem here is that regular algorithm will find the best plan every
time it finishes. However, the above is supposed to be used for the
algorithms
that none of them is supposed to find the best solution.

> That would be an entirely different problem, I think.  If you want to
> replace the *entire* planner, there's already the planner_hook to do that.
>
Replacing the whole planner would need a much more code to be reused without
any change than in this case.

> you can't just willy-nilly replace the code for
> single-relation path selection, at least not without a whole lot of
> changes elsewhere in the planner infrastructure.
>
Would the code in dummy.c work in a way that I expect and explained above?
If there is no way of how to make the code work then it makes no sense
to put the hook to the place I am proposing. It works for me, but I have
not tested that very well yet. If I would swap calls to geqo
and make_one_rel_by_joins it will not work. Therefore there might be
an issue I do not know about yet.
> My thought was that the reason for this hook to exist was simply to
> provide a convenient way to replace only the join search order
> algorithm.
Yes, thats true. I do not have a plan to do something more for now.

Thank you

Regards

Julius Stroffek

#include "postgres.h"

#include "fmgr.h"
#include "optimizer/geqo.h"
#include "optimizer/paths.h"
#include "optimizer/pathnode.h"


PG_MODULE_MAGIC;

void    _PG_init(void);
void    _PG_fini(void);

typedef RelOptInfo * (*jos_function_type) (PlannerInfo *root, int levels_needed, List* initial_rels);

static
RelOptInfo *
jos_search(PlannerInfo *root, List *joinlist, jos_function_type jos_function)
{
     int         levels_needed;
     List       *initial_rels;
     ListCell   *jl;

     /*
      * Count the number of child joinlist nodes.  This is the depth of the
      * dynamic-programming algorithm we must employ to consider all ways of
      * joining the child nodes.
      */
     levels_needed = list_length(joinlist);

     if (levels_needed <= 0)
         return NULL;            /* nothing to do? */

     /*
      * Construct a list of rels corresponding to the child joinlist nodes.
      * This may contain both base rels and rels constructed according to
      * sub-joinlists.
      */
     initial_rels = NIL;
     foreach(jl, joinlist)
     {
         Node       *jlnode = (Node *) lfirst(jl);
         RelOptInfo *thisrel;

         if (IsA(jlnode, RangeTblRef))
         {
             int         varno = ((RangeTblRef *) jlnode)->rtindex;

             thisrel = find_base_rel(root, varno);
         }
         else if (IsA(jlnode, List))
         {
             /* Recurse to handle subproblem */
             thisrel = jos_search(root, (List *) jlnode, jos_function);
         }
         else
         {
             elog(ERROR, "unrecognized joinlist node type: %d",
                  (int) nodeTag(jlnode));
             thisrel = NULL;     /* keep compiler quiet */
         }

         initial_rels = lappend(initial_rels, thisrel);
     }

     if (levels_needed == 1)
     {
         /*
          * Single joinlist node, so we're done.
          */
         return (RelOptInfo *) linitial(initial_rels);
     }
     else
     {
         jos_function(root, levels_needed, initial_rels);
     }
}


static
RelOptInfo *
jos_dummy(PlannerInfo *root, List *joinlist)
{
    RelOptInfo *dynamic_result, *geqo_result;
    List *copy;
    Cost dynamic_cost, geqo_cost;

    copy = list_copy(joinlist);
    elog(LOG, "Starting a join order search \"geqo\"...");
    geqo_result = jos_search(root, copy, geqo);
//    geqo_result = jos_search(root, copy, make_one_rel_by_joins);
    geqo_cost = geqo_result->cheapest_total_path->total_cost;

    copy = list_copy(joinlist);
    elog(LOG, "Starting a join order search \"dynamic\"...");
    dynamic_result = jos_search(root, copy, make_one_rel_by_joins);
//    dynamic_result = jos_search(root, copy, geqo);
    dynamic_cost = dynamic_result->cheapest_total_path->total_cost;


    printf("GEQO cost: %f\n", geqo_cost);
    printf("Dynamic programming cost: %f\n", dynamic_cost);

    if (geqo_cost < dynamic_cost)
        return geqo_result;
    else
        return dynamic_result;
}


void
_PG_init(void)
{
    join_order_search_hook = jos_dummy;
}

void
_PG_fini(void)
{
    join_order_search_hook = 0;
}


Re: Optimizer hook

From
Tom Lane
Date:
Julius Stroffek <Julius.Stroffek@Sun.COM> writes:
> Parts of the result path might be found by geqo and parts of it by
> regular algorithm.

Why would you care?  Seems like forcing that to not happen is actively
making it stupider.

> If there is no way of how to make the code work then it makes no sense
> to put the hook to the place I am proposing. It works for me, but I have
> not tested that very well yet. If I would swap calls to geqo
> and make_one_rel_by_joins it will not work. Therefore there might be
> an issue I do not know about yet.

Well, I can see one likely problem: list_copy is a shallow copy and
thus doesn't ensure that the second set of functions sees the same input
data structures as the first.  I know that geqo has to go through some
special pushups to perform multiple invocations of the base planner,
and I suspect you need that here too.  Look at geqo_eval().

            regards, tom lane

Re: Optimizer hook

From
Julius Stroffek
Date:
> Why would you care?  Seems like forcing that to not happen is actively
> making it stupider.
>
To better compare the algorithms and possibly not for final solution at
the beginning. If we would implement 10 algorithms and want to pickup
just 3 best ones to be used and throw 7 away.

Later on, we can try to run just the one "very fast" algorithm and
depending on the cost decide whether we would run remaining 9 or
less or even none.

Yes, the example in dummy.c is really stupider, but it could be done
in more clever way.

> Well, I can see one likely problem: list_copy is a shallow copy and
> thus doesn't ensure that the second set of functions sees the same input
> data structures as the first.  I know that geqo has to go through some
> special pushups to perform multiple invocations of the base planner,
> and I suspect you need that here too.  Look at geqo_eval().

I'll explore that.

Thanks

Regards

Julius Stroffek


Re: Optimizer hook

From
Tom Lane
Date:
Julius Stroffek <Julius.Stroffek@Sun.COM> writes:
>> Why would you care?  Seems like forcing that to not happen is actively
>> making it stupider.
>>
> To better compare the algorithms and possibly not for final solution at
> the beginning. If we would implement 10 algorithms and want to pickup
> just 3 best ones to be used and throw 7 away.

Well, you could in any case force the same decision to be made in every
invocation, for example by driving it off a GUC variable.  The idea you
have here seems to be "it'll be the same choice in every sub-problem,
only we won't know which one afterwards", which doesn't seem at all
helpful for planner testing purposes.

> Yes, the example in dummy.c is really stupider, but it could be done
> in more clever way.

I think dummy.c kinda proves my point: more than half the code is
accomplishing nothing except to duplicate the behavior of
make_rel_from_joinlist.

            regards, tom lane

Re: Optimizer hook

From
Tom Lane
Date:
I've applied this patch with revision to put the hook where I thought it
made sense.  Attached is a modification of your dummy.c to show use of
the hook.  I didn't test it heavily, but I did check that it seemed to
work with either order of calling geqo() and standard_join_search().

            regards, tom lane


#include "postgres.h"

#include "fmgr.h"
#include "optimizer/geqo.h"
#include "optimizer/paths.h"
#include "optimizer/pathnode.h"


PG_MODULE_MAGIC;

void    _PG_init(void);
void    _PG_fini(void);

static RelOptInfo *
my_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
{
    RelOptInfo *dynamic_result, *geqo_result;
    Cost dynamic_cost, geqo_cost;
    List       *dynamic_list, *geqo_list;
    struct HTAB *dynamic_hash, *geqo_hash;
    int            savelength;

    savelength = list_length(root->join_rel_list);

    root->join_rel_hash = NULL;

    elog(LOG, "Starting a join order search \"dynamic\"...");
    dynamic_result = standard_join_search(root, levels_needed, initial_rels);
    dynamic_cost = dynamic_result->cheapest_total_path->total_cost;

    dynamic_list = list_copy(root->join_rel_list);
    dynamic_hash = root->join_rel_hash;

    root->join_rel_list = list_truncate(root->join_rel_list,
                                        savelength);
    root->join_rel_hash = NULL;

    elog(LOG, "Starting a join order search \"geqo\"...");
    geqo_result = geqo(root, levels_needed, initial_rels);
    geqo_cost = geqo_result->cheapest_total_path->total_cost;

    geqo_list = list_copy(root->join_rel_list);
    geqo_hash = root->join_rel_hash;

    fprintf(stderr, "GEQO cost: %f  Dynamic programming cost: %f\n",
            geqo_cost, dynamic_cost);

    if (geqo_cost < dynamic_cost)
    {
        root->join_rel_list = geqo_list;
        root->join_rel_hash = geqo_hash;
        return geqo_result;
    }
    else
    {
        root->join_rel_list = dynamic_list;
        root->join_rel_hash = dynamic_hash;
        return dynamic_result;
    }
}


void
_PG_init(void)
{
    join_search_hook = my_join_search;
}

void
_PG_fini(void)
{
    join_search_hook = NULL;
}