Thread: [v9.5] Custom Plan API

[v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

07 May 2014, 01:06:59

Prior to the development cycle towards v9.5, I'd like to reopen
the discussion of custom-plan interface. Even though we had lots
of discussion during the last three commit-fests, several issues
are still under discussion. So, I'd like to clarify direction of
the implementation, prior to the first commit-fest.

(1) DDL support and system catalog

Simon suggested that DDL command should be supported to track custom-
plan providers being installed, and to avoid nonsense hook calls
if it is an obvious case that custom-plan provider can help. It also
makes sense to give a chance to load extensions once installed.
(In the previous design, I assumed modules are loaded by LOAD command
or *_preload_libraries parameters).

I tried to implement the following syntax:

  CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

It records a particular function as an entrypoint of custom-plan provider,
then it will be called when planner tries to find out the best path to scan
or join relations. This function takes an argument (INTERNAL type) that packs
information to construct and register an alternative scan/join path, like
PlannerInfo, RelOptInfo and so on.

(*) The data structure below will be supplied, in case of scan path.
  typedef struct {
     uint32          custom_class;
     PlannerInfo    *root;
     RelOptInfo     *baserel;
     RangeTblEntry  *rte;
  } customScanArg;

This function, usually implemented with C-language, can construct a custom
object being delivered from CustomPath type that contains a set of function
pointers; including functions that populate another objects delivered from
CustomPlan or CustomPlanState as I did in the patch towards v9.4 development.

Properties of individual custom-plan providers are recorded in the
pg_custom_plan system catalog. Right now, its definition is quite simple
- only superuser can create / drop custom-plan providers, and its definition
does not belong to a particular namespace.
Because of this assumption (only superuser can touch), I don't put database
ACL mechanism here.
What kind of characteristics should be there?


(2) Static functions to be exported

Tom concerned that custom-plan API needs several key functions can be
called by extensions, although these are declared as static functions,
thus, it looks like a part of interfaces.
Once people thought it is stable ones we can use beyond the version up,
it may become a barrier to the future improvement in the core code.
Is it a right understanding, isn't it?

One solution is to write a notice clearly, like: "these external functions
are not stable interfaces, so extension should not assumed these functions
are available beyond future version up".

Nevertheless, more stable functions are more kindness for authors of extensions.
So, I tried a few approaches.

First of all, we categorized functions into three categories.
(A) It walks on plan/expression tree recursively.
(B) It modifies internal state of the core backend.
(C) It is commonly used but in a particular source file.

Although the number of functions are not so many, (A) and (B) must have
its entrypoint from extensions. If unavailable, extension needs to manage
a copied code with small enhancement by itself, and its burden is similar
to just branching the tree.
Example of (A) are: create_plan_recurse, set_plan_refs, ...
Example of (B) are: fix_expr_common, ...

On the other hands, (C) functions are helpful if available, however, it
is not mandatory requirement to implement.

Our first trial, according to the proposition by Tom, is to investigate
a common walker function on plan tree as we are now doing on expression
tree. We expected, we can give function pointers of key routines to
extensions, instead of exporting the static functions.
However, it didn't work well because existing recursive call takes
various kind of jobs for each plan-node type, so it didn't fit a structure
of walker functions; that applies a uniform operation for each node.

Note that, I assumed the following walker functions that applies plan_walker
or expr_walker on the underlying plan/expression trees.
    bool
    plan_tree_walker(Plan *plan,
                     bool (*plan_walker) (),
                     bool (*expr_walker) (),
                     void *context)
Please tell me if it is different from your ideas, I'll reconsider it.

On the next, I tried another approach that gives function pointers of
(A) and (B) functions as a part of custom-plan interface.
It is workable at least, however, it seems to me its interface definition
has advantage in comparison to the original approach.

For example, below is definition of the callback in setref.c.

+   void    (*SetCustomPlanRef)(PlannerInfo *root,
+                               CustomPlan *custom_plan,
+                               int rtoffset,
+                               Plan *(*fn_set_plan_refs)(PlannerInfo *root,
+                                                         Plan *plan,
+                                                         int rtoffset),
+                               void (*fn_fix_expr_common)(PlannerInfo *root,
+                                                          Node *node));

Extension needs set_plan_refs() and fix_expr_common() at least, I added
function pointers of them. But this definition has to be updated according
to the future update of these functions. It does not seem to me a proper
way to smooth the impact of future internal change.

So, I'd like to find out where is a good common ground to solve the matter.

One idea is the first simple solution. The core PostgreSQL will be developed
independently from the out-of-tree modules, so we don't care about stability
of declaration of internal functions, even if it is exported to multiple
source files. (I believe it is our usual manner.)

One other idea is, a refactoring of the core backend to consolidate routines
per plan-node, not processing stage. For example, createplan.c contains most
of codes commonly needed to create plan, in addition to individual plan node.
Let's assume a function like create_seqscan_plan() are located in a separated
source file, then routines to be exported become clear.
One expected disadvantage is, this refactoring makes complicated to back patches.

Do you have any other ideas to implement it well?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp]
> Sent: Tuesday, April 29, 2014 10:07 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Tom Lane; Andres Freund; Robert Haas; Simon Riggs; PgHacker; Stephen
> Frost; Shigeru Hanada; Jim Mlodgenski; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> >> Yeah.  I'm still not exactly convinced that custom-scan will ever
> >> allow independent development of new plan types (which, with all due
> >> respect to Robert, is what it was being sold as last year in Ottawa).
> >> But I'm not opposed in principle to committing it, if we can find a
> >> way to have a cleaner API for things like setrefs.c.  It seems like
> >> late-stage planner processing in general is an issue for this patch
> >> (createplan.c and subselect.c are also looking messy).  EXPLAIN isn't
> too great either.
> >>
> >> I'm not sure exactly what to do about those cases, but I wonder
> >> whether things would get better if we had the equivalent of
> >> expression_tree_walker/mutator capability for plan nodes.  The state
> >> of affairs in setrefs and subselect, at least, is a bit reminiscent
> >> of the bad old days when we had lots of different bespoke code for
> >> traversing expression trees.
> >>
> > Hmm. If we have something like expression_tree_walker/mutator for plan
> > nodes, we can pass a walker/mutator function's pointer instead of
> > exposing static functions that takes recursive jobs.
> > If custom-plan provider (that has sub-plans) got a callback with
> > walker/ mutator pointer, all it has to do for sub-plans are calling
> > this new plan-tree walking support routine with supplied walker/mutator.
> > It seems to me more simple design than what I did.
> >
> I tried to code the similar walker/mutator functions on plan-node tree,
> however, it was not available to implement these routines enough simple,
> because the job of walker/mutator functions are not uniform thus caller
> side also must have a large switch-case branches.
>
> I picked up setrefs.c for my investigation.
> The set_plan_refs() applies fix_scan_list() on the expression tree being
> appeared in the plan node if it is delivered from Scan, however, it also
> applies set_join_references() for subclass of Join, or
> set_dummy_tlist_references() for some other plan nodes.
> It implies that the walker/mutator functions of Plan node has to apply
> different operation according to the type of Plan node. I'm not certain
> how much different forms are needed.
> (In addition, set_plan_refs() performs usually like a walker, but often
> performs as a mutator if trivial subquery....)
>
> I'm expecting the function like below. It allows to call plan_walker
> function for each plan-node and also allows to call expr_walker function
> for each expression-node on the plan node.
>
>     bool
>     plan_tree_walker(Plan *plan,
>                      bool (*plan_walker) (),
>                      bool (*expr_walker) (),
>                      void *context)
>
> I'd like to see if something other form to implement this routine.
>
>
> One alternative idea to give custom-plan provider a chance to handle its
> subplans is, to give function pointers (1) to handle recursion of plan-tree
> and (2) to set up backend's internal state.
> In case of setrefs.c, set_plan_refs() and fix_expr_common() are minimum
> necessity for extensions. It also kills necessity to export static
> functions.
>
> How about your thought?
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

pgsql-v9.5-custom-plan-with-ctidscan.v0.patch

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

07 May 2014, 06:18:59

On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Prior to the development cycle towards v9.5, I'd like to reopen
> the discussion of custom-plan interface. Even though we had lots
> of discussion during the last three commit-fests, several issues
> are still under discussion. So, I'd like to clarify direction of
> the implementation, prior to the first commit-fest.
>
> (1) DDL support and system catalog
>
> Simon suggested that DDL command should be supported to track custom-
> plan providers being installed, and to avoid nonsense hook calls
> if it is an obvious case that custom-plan provider can help. It also
> makes sense to give a chance to load extensions once installed.
> (In the previous design, I assumed modules are loaded by LOAD command
> or *_preload_libraries parameters).
>
> I tried to implement the following syntax:
>
>   CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

Thank you for exploring that thought and leading the way on this
research. I've been thinking about this also.

What I think we need is a declarative form that expresses the linkage
between base table(s) and a related data structures that can be used
to optimize a query, while still providing accurate results.

In other DBMS, we have concepts such as a JoinIndex or a MatView which
allow some kind of lookaside behaviour. Just for clarity, a concrete
example is Oracle's Materialized Views which can be set using ENABLE
QUERY REWRITE so that the MatView can be used as an alternative path
for a query. We do already have this concept in PostgreSQL, where an
index can be used to perform an IndexOnlyScan rather than accessing
the heap itself.

We have considerable evidence that the idea of alternate data
structures results in performance gains.
* KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom
* http://www.postgresql.org/message-id/52C59858.9090500@garret.ru
* http://citusdata.github.io/cstore_fdw/
* University of Manchester - exploring GPUs as part of the AXLE project
* Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE project
* Some other authors have also cited gains using GPU technology in databases

So I would like to have a mechanism that provides a *generic*
Lookaside for a table or foreign table.

Tom and Kevin have previously expressed that MatViews would represent
a planning problem, in the general case. One way to solve that
planning issue is to link structures directly together, in the same
way that an index and a table are linked. We can then process the
lookaside in the same way we handle a partial index - check
prerequisites and if usable, calculate a cost for the alternate path.
We need not add planning time other than to the tables that might
benefit from that.

Roughly, I'm thinking of this...

CREATE LOOKASIDE ON foo  TO foo_mat_view;

and also this...

CREATE LOOKASIDE ON foo  TO foo_as_a_foreign_table   /* e.g. PGStrom */

This would allow the planner to consider alternate plans for foo_mv
during set_plain_rel_pathlist() similarly to the way it considers
index paths, in one of the common cases that the mat view covers just
one table.

This concept is similar to ENABLE QUERY REWRITE in Oracle, but this
thought goes much further, to include any generic user-defined data
structure or foreign table.

Do we need this? For MVs, we *might* be able to deduce that the MV is
rewritable for "foo", but that is not deducible for Foreign Tables, by
current definition, so I prefer the explicit definition of objects
that are linked - since doing this for indexes is already familiar to
people.

Having an explicit linkage between data structures allows us to
enhance an existing application by transaparently adding new
structures, just as we already do with indexes. Specifically, that we
allow more than one lookaside structure on any one table.

Forget the exact name, thats not important. But I think the
requirements here are...

* Explicit definition that we are attaching an alternate path onto a
table (conceptually similar to adding an index)

* Ability to check that the alternate path is viable (similar to the
way we validate use of partial indexes prior to usage)   Checks on columns(SELECT), rows(WHERE), aggregations(GROUP)

* Ability to consider access cost for both normal table and alternate
path (like an index) - this allows the alternate path to *not* be
chosen when we are performing some operation that is sub-optimal (for
whatever reason).

* There may be some need to define operator classes that are
implemented via the alternate path

which works for single tables, but a later requirement would then be

* allows the join of one or more tables to be replaced with a single lookaside

Hopefully, we won't need a "Custom Plan" at all, just the ability to
lookaside when useful.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

07 May 2014, 07:17:48

> On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Prior to the development cycle towards v9.5, I'd like to reopen the
> > discussion of custom-plan interface. Even though we had lots of
> > discussion during the last three commit-fests, several issues are
> > still under discussion. So, I'd like to clarify direction of the
> > implementation, prior to the first commit-fest.
> >
> > (1) DDL support and system catalog
> >
> > Simon suggested that DDL command should be supported to track custom-
> > plan providers being installed, and to avoid nonsense hook calls if it
> > is an obvious case that custom-plan provider can help. It also makes
> > sense to give a chance to load extensions once installed.
> > (In the previous design, I assumed modules are loaded by LOAD command
> > or *_preload_libraries parameters).
> >
> > I tried to implement the following syntax:
> >
> >   CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;
> 
> Thank you for exploring that thought and leading the way on this research.
> I've been thinking about this also.
> 
> What I think we need is a declarative form that expresses the linkage between
> base table(s) and a related data structures that can be used to optimize
> a query, while still providing accurate results.
> 
> In other DBMS, we have concepts such as a JoinIndex or a MatView which allow
> some kind of lookaside behaviour. Just for clarity, a concrete example is
> Oracle's Materialized Views which can be set using ENABLE QUERY REWRITE
> so that the MatView can be used as an alternative path for a query. We do
> already have this concept in PostgreSQL, where an index can be used to
> perform an IndexOnlyScan rather than accessing the heap itself.
> 
> We have considerable evidence that the idea of alternate data structures
> results in performance gains.
> * KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom
> * http://www.postgresql.org/message-id/52C59858.9090500@garret.ru
> * http://citusdata.github.io/cstore_fdw/
> * University of Manchester - exploring GPUs as part of the AXLE project
> * Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE
> project
> * Some other authors have also cited gains using GPU technology in databases
> 
> So I would like to have a mechanism that provides a *generic* Lookaside
> for a table or foreign table.
> 
> Tom and Kevin have previously expressed that MatViews would represent a
> planning problem, in the general case. One way to solve that planning issue
> is to link structures directly together, in the same way that an index and
> a table are linked. We can then process the lookaside in the same way we
> handle a partial index - check prerequisites and if usable, calculate a
> cost for the alternate path.
> We need not add planning time other than to the tables that might benefit
> from that.
> 
> Roughly, I'm thinking of this...
> 
> CREATE LOOKASIDE ON foo
>    TO foo_mat_view;
> 
> and also this...
> 
> CREATE LOOKASIDE ON foo
>    TO foo_as_a_foreign_table   /* e.g. PGStrom */
> 
> This would allow the planner to consider alternate plans for foo_mv during
> set_plain_rel_pathlist() similarly to the way it considers index paths,
> in one of the common cases that the mat view covers just one table.
> 
> This concept is similar to ENABLE QUERY REWRITE in Oracle, but this thought
> goes much further, to include any generic user-defined data structure or
> foreign table.
> 
Let me clarify. This mechanism allows to add alternative scan/join paths
including built-in ones, not only custom enhanced plan/exec node, isn't it?
Probably, it is a variation of above proposition if we install a handler
function that proposes built-in path nodes towards the request for scan/join.

> Do we need this? For MVs, we *might* be able to deduce that the MV is
> rewritable for "foo", but that is not deducible for Foreign Tables, by
> current definition, so I prefer the explicit definition of objects that
> are linked - since doing this for indexes is already familiar to people.
> 
> Having an explicit linkage between data structures allows us to enhance
> an existing application by transaparently adding new structures, just as
> we already do with indexes. Specifically, that we allow more than one
> lookaside structure on any one table.
> 
Not only alternative data structure, alternative method to scan/join towards
same data structure is also important, isn't it?

> Forget the exact name, thats not important. But I think the requirements
> here are...
> 
> * Explicit definition that we are attaching an alternate path onto a table
> (conceptually similar to adding an index)
> 
I think the syntax allows "tables", not only a particular table.
It will inform the core planner this lookaside/customplan (name is not
important, anyway this feature...) can provide alternative path towards
the set of relations; being considered. So, it allows to reduce number of
function calls on planner stage.

> * Ability to check that the alternate path is viable (similar to the way
> we validate use of partial indexes prior to usage)
>     Checks on columns(SELECT), rows(WHERE), aggregations(GROUP)
> 
I never deny it... but do you think this feature from the initial version??

> * Ability to consider access cost for both normal table and alternate path
> (like an index) - this allows the alternate path to *not* be chosen when
> we are performing some operation that is sub-optimal (for whatever reason).
> 
It is an usual job of existing planner, isn't it?

> * There may be some need to define operator classes that are implemented
> via the alternate path
> 
> which works for single tables, but a later requirement would then be
> 
> * allows the join of one or more tables to be replaced with a single lookaside
> 
It's higher priority for me, and I guess it is same in MatView usage.

> Hopefully, we won't need a "Custom Plan" at all, just the ability to
> lookaside when useful.
> 
Probably, lookaside is a special case in the scenario that custom-plan can
provide. I also think it is an attractive use case if we can redirect
a particular complicated join into a MatView reference. So, it makes sense
to bundle a handler function to replace join by matview reference.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

07 May 2014, 08:01:49

On 7 May 2014 08:17, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Let me clarify. This mechanism allows to add alternative scan/join paths
> including built-in ones, not only custom enhanced plan/exec node, isn't it?
> Probably, it is a variation of above proposition if we install a handler
> function that proposes built-in path nodes towards the request for scan/join.

Yes, I am looking for a way to give you the full extent of your
requirements, within the Postgres framework. I have time and funding
to assist you in achieving this in a general way that all may make use
of.

> Not only alternative data structure, alternative method to scan/join towards
> same data structure is also important, isn't it?

Agreed. My proposal is that if the planner allows the lookaside to an
FDW then we pass the query for full execution on the FDW. That means
that the scan, aggregate and join could take place via the FDW. i.e.
"Custom Plan" == lookaside + FDW

Or put another way, if we add Lookaside then we can just plug in the
pgstrom FDW directly and we're done. And everybody else's FDW will
work as well, so Citus etcc will not need to recode.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

07 May 2014, 09:07:21

> -----Original Message-----
> From: Simon Riggs [mailto:simon@2ndQuadrant.com]
> Sent: Wednesday, May 07, 2014 5:02 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Tom Lane; Robert Haas; Andres Freund; PgHacker; Stephen Frost; Shigeru
> Hanada; Jim Mlodgenski; Peter Eisentraut; Kohei KaiGai
> Subject: Re: [v9.5] Custom Plan API
> 
> On 7 May 2014 08:17, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> 
> > Let me clarify. This mechanism allows to add alternative scan/join
> > paths including built-in ones, not only custom enhanced plan/exec node,
> isn't it?
> > Probably, it is a variation of above proposition if we install a
> > handler function that proposes built-in path nodes towards the request
> for scan/join.
> 
> Yes, I am looking for a way to give you the full extent of your requirements,
> within the Postgres framework. I have time and funding to assist you in
> achieving this in a general way that all may make use of.
> 
> > Not only alternative data structure, alternative method to scan/join
> > towards same data structure is also important, isn't it?
> 
> Agreed. My proposal is that if the planner allows the lookaside to an FDW
> then we pass the query for full execution on the FDW. That means that the
> scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW
> 
> Or put another way, if we add Lookaside then we can just plug in the pgstrom
> FDW directly and we're done. And everybody else's FDW will work as well,
> so Citus etcc will not need to recode.
> 
Hmm. That sounds me, you intend to make FDW perform as a central facility
to host pluggable plan/exec stuff. Even though we have several things to be
clarified, I also think it's a direction worth to investigate.

Let me list up the things to be clarified / developed randomly.

* Join replacement by FDW; We still don't have consensus about join replacement by FDW. Probably, it will be designed
toremote-join implementation primarily, however, things to do is similar. We may need to revisit the Hanada-san's
propositionin the past.

* Lookaside for ANY relations; I want planner to try GPU-scan for any relations once installed, to reduce user's
administrationcost. It needs lookaside allow to specify a particular foreign-server, not foreign- table, then create
ForeignScannode that is not associated with a particular foreign-table.

* ForeignScan node that is not associated with a particular foreign-table. Once we try to apply ForeignScan node
insteadof Sort or Aggregate, existing FDW implementation needs to be improved. These nodes scan on a materialized
relation(generated on the fly), however, existing FDW code assumes ForeignScan node is always associated with a
particularforeign-table. We need to eliminate this restriction.

* FDW method for MultiExec. In case when we can stack multiple ForeignScan nodes, it's helpful to support to exchange
scannedtuples in their own data format. Let's assume two ForeignScan nodes are stacked. One performs like Sort, another
performslike Scan. If they internally handle column- oriented data format, TupleTableSlot is not a best way for data
exchange.

* Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented using writable FDW feature. Not a big issue,
butdon't forget it...

How about your opinion?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

07 May 2014, 10:29:41

On 7 May 2014 10:06, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Let me list up the things to be clarified / developed randomly.
>
> * Join replacement by FDW; We still don't have consensus about join replacement
>   by FDW. Probably, it will be designed to remote-join implementation primarily,
>   however, things to do is similar. We may need to revisit the Hanada-san's
>   proposition in the past.

Agreed. We need to push down joins into FDWs and we need to push down
aggregates also, so they can be passed to FDWs. I'm planning to look
at aggregate push down.

> * Lookaside for ANY relations; I want planner to try GPU-scan for any relations
>   once installed, to reduce user's administration cost.
>   It needs lookaside allow to specify a particular foreign-server, not foreign-
>   table, then create ForeignScan node that is not associated with a particular
>   foreign-table.

IMHO we would not want to add indexes to every column, on every table,
nor would we wish to use lookaside for all tables. It is a good thing
to be able to add optimizations for individual tables. GPUs are not
good for everything; it is good to be able to leverage their
strengths, yet avoid their weaknesses.

If do you want that, you can write an Event Trigger that automatically
adds a lookaside for any table.

> * ForeignScan node that is not associated with a particular foreign-table.
>   Once we try to apply ForeignScan node instead of Sort or Aggregate, existing
>   FDW implementation needs to be improved. These nodes scan on a materialized
>   relation (generated on the fly), however, existing FDW code assumes
>   ForeignScan node is always associated with a particular foreign-table.
>   We need to eliminate this restriction.

I don't think we need to do that, given the above.

> * FDW method for MultiExec. In case when we can stack multiple ForeignScan
>   nodes, it's helpful to support to exchange scanned tuples in their own
>   data format. Let's assume two ForeignScan nodes are stacked. One performs
>   like Sort, another performs like Scan. If they internally handle column-
>   oriented data format, TupleTableSlot is not a best way for data exchange.

I agree TupleTableSlot may not be best way for bulk data movement. We
probably need to look at buffering/bulk movement between executor
nodes in general, which would be of benefit for the FDW case also.
This would be a problem even for Custom Scans as originally presented
also, so I don't see much change there.

> * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
>   using writable FDW feature. Not a big issue, but don't forget it...

Yes, possible.

I hope these ideas make sense. This is early days and there may be
other ideas and much detail yet to come.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

07 May 2014, 16:33:20

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> Agreed. My proposal is that if the planner allows the lookaside to an
> FDW then we pass the query for full execution on the FDW. That means
> that the scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW

How about we get that working for FDWs to begin with and then we can
come back to this idea..?  We're pretty far from join-pushdown or
aggregate-pushdown to FDWs, last I checked, and having those would be a
massive win for everyone using FDWs.
Thanks,        Stephen

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

07 May 2014, 16:43:31

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> IMHO we would not want to add indexes to every column, on every table,
> nor would we wish to use lookaside for all tables. It is a good thing
> to be able to add optimizations for individual tables. GPUs are not
> good for everything; it is good to be able to leverage their
> strengths, yet avoid their weaknesses.

It's the optimizer's job to figure out which path to pick though, based
on which will have the lowest cost.

> If do you want that, you can write an Event Trigger that automatically
> adds a lookaside for any table.

This sounds terribly ugly and like we're pushing optimization decisions
on to the user instead of just figuring out what the best answer is.

> I agree TupleTableSlot may not be best way for bulk data movement. We
> probably need to look at buffering/bulk movement between executor
> nodes in general, which would be of benefit for the FDW case also.
> This would be a problem even for Custom Scans as originally presented
> also, so I don't see much change there.

Being able to do bulk movement would be useful, but (as I proposed
months ago) being able to do asyncronous returns would be extremely
useful also, when you consider FDWs and Append()- the main point there
being that you want to keep the FDWs busy and working in parallel.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

07 May 2014, 17:23:59

On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote:
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> IMHO we would not want to add indexes to every column, on every table,
>> nor would we wish to use lookaside for all tables. It is a good thing
>> to be able to add optimizations for individual tables. GPUs are not
>> good for everything; it is good to be able to leverage their
>> strengths, yet avoid their weaknesses.
>
> It's the optimizer's job to figure out which path to pick though, based
> on which will have the lowest cost.

Of course. I'm not suggesting otherwise.

>> If do you want that, you can write an Event Trigger that automatically
>> adds a lookaside for any table.
>
> This sounds terribly ugly and like we're pushing optimization decisions
> on to the user instead of just figuring out what the best answer is.

I'm proposing that we use a declarative approach, just like we do when
we say CREATE INDEX.

The idea is that we only consider a lookaside when a lookaside has
been declared. Same as when we add an index, the optimizer considers
whether to use that index. What we don't want to happen is that the
optimizer considers a GIN plan, even when a GIN index is not
available.

I'll explain it more at the developer meeting. It probably sounds a
bit weird at first.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

07 May 2014, 17:39:21

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote:
> > It's the optimizer's job to figure out which path to pick though, based
> > on which will have the lowest cost.
>
> Of course. I'm not suggesting otherwise.
>
> >> If do you want that, you can write an Event Trigger that automatically
> >> adds a lookaside for any table.
> >
> > This sounds terribly ugly and like we're pushing optimization decisions
> > on to the user instead of just figuring out what the best answer is.
>
> I'm proposing that we use a declarative approach, just like we do when
> we say CREATE INDEX.

There's quite a few trade-offs when it comes to indexes though.  I'm
trying to figure out when you wouldn't want to use a GPU, if it's
available to you and the cost model says it's faster?  To me, that's
kind of like saying you want a declarative approach for when to use a
HashJoin.

> The idea is that we only consider a lookaside when a lookaside has
> been declared. Same as when we add an index, the optimizer considers
> whether to use that index. What we don't want to happen is that the
> optimizer considers a GIN plan, even when a GIN index is not
> available.

Yes, I understood your proposal- I just don't agree with it. ;)

For MatViews and/or Indexes, there are trade-offs to be had as it
relates to disk space, insert speed, etc.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

08 May 2014, 01:06:33

> > Let me list up the things to be clarified / developed randomly.
> >
> > * Join replacement by FDW; We still don't have consensus about join
> replacement
> >   by FDW. Probably, it will be designed to remote-join implementation
> primarily,
> >   however, things to do is similar. We may need to revisit the Hanada-san's
> >   proposition in the past.
> 
> Agreed. We need to push down joins into FDWs and we need to push down
> aggregates also, so they can be passed to FDWs. I'm planning to look at
> aggregate push down.
> 
Probably, it's a helpful feature.

> > * Lookaside for ANY relations; I want planner to try GPU-scan for any
> relations
> >   once installed, to reduce user's administration cost.
> >   It needs lookaside allow to specify a particular foreign-server, not
> foreign-
> >   table, then create ForeignScan node that is not associated with a
> particular
> >   foreign-table.
> 
> IMHO we would not want to add indexes to every column, on every table, nor
> would we wish to use lookaside for all tables. It is a good thing to be
> able to add optimizations for individual tables. GPUs are not good for
> everything; it is good to be able to leverage their strengths, yet avoid
> their weaknesses.
> 
> If do you want that, you can write an Event Trigger that automatically adds
> a lookaside for any table.
> 
It may be a solution if we try to replace scan on a relation by a ForeignScan,
in other words, a case when we can describe 1:1 relationship between a table
and a foreign-table; being alternatively scanned.

Is it possible to fit a case when a ForeignScan replaces a built-in Join plans?
I don't think it is a realistic assumption to set up lookaside configuration
for all the possible combination of joins, preliminary.

I have an idea; if lookaside accept a function, foreign-server or something
subjective entity as an alternative path, it will be able to create paths
on the fly, not only preconfigured foreign-tables.
This idea will take two forms of DDL commands as:
 CREATE LOOKASIDE <name> ON <target reltaion>   TO <alternative table/matview/foreign table/...>;
 CREATE LOOKASIDE <name> ON <target relation>   EXECUTE <path generator function>;

Things to do internally is same. TO- form kicks a built-in routine, instead
of user defined function, to add alternative scan/join paths according to
the supplied table/matview/foreign table and so on.


> > * ForeignScan node that is not associated with a particular foreign-table.
> >   Once we try to apply ForeignScan node instead of Sort or Aggregate,
> existing
> >   FDW implementation needs to be improved. These nodes scan on a
> materialized
> >   relation (generated on the fly), however, existing FDW code assumes
> >   ForeignScan node is always associated with a particular foreign-table.
> >   We need to eliminate this restriction.
> 
> I don't think we need to do that, given the above.
> 
It makes a problem if ForeignScan is chosen as alternative path of Join.

The target-list of Join node are determined according to the query form
on the fly, so we cannot expect a particular TupleDesc to be returned
preliminary. Once we try to apply ForeignScan instead of Join node, it
has to have its TupleDesc depending on a set of joined relations.

I think, it is more straightforward approach to allow ForeignScan that
is not associated to a particular (cataloged) relations.

> > * FDW method for MultiExec. In case when we can stack multiple ForeignScan
> >   nodes, it's helpful to support to exchange scanned tuples in their own
> >   data format. Let's assume two ForeignScan nodes are stacked. One
> performs
> >   like Sort, another performs like Scan. If they internally handle column-
> >   oriented data format, TupleTableSlot is not a best way for data
> exchange.
> 
> I agree TupleTableSlot may not be best way for bulk data movement. We
> probably need to look at buffering/bulk movement between executor nodes
> in general, which would be of benefit for the FDW case also.
> This would be a problem even for Custom Scans as originally presented also,
> so I don't see much change there.
> 
Yes. I is the reason why my Custom Scan proposition supports MultiExec method.

> > * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented
> >   using writable FDW feature. Not a big issue, but don't forget it...
> 
> Yes, possible.
> 
> 
> I hope these ideas make sense. This is early days and there may be other
> ideas and much detail yet to come.
> 
I'd like to agree general direction. My biggest concern towards FDW is
transparency for application. If lookaside allows to redirect a reference
towards scan/join on regular relations by ForeignScan (as an alternative
method to execute), here is no strong reason to stick on custom-plan.

However, existing ForeignScan node does not support to work without
a particular foreign table. It may become a restriction if we try to
replace Join node by ForeignScan, and it is my worry.
(Even it may be solved during Join replacement by FDW works.)

One other point I noticed.

* SubPlan support; if an extension support its special logic to join relations, but don't want to support various
methodto scan relations, it is natural to leverage built-in scan logics (like SeqScan, ...). I want ForeignScan to
supportto have SubPlans if FDW driver has capability. I believe it can be implemented according to the existing manner,
butwe need to expose several static functions to handle plan-tree recursively.
 

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 02:18:46

On 8 May 2014 01:49, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

>> > * ForeignScan node that is not associated with a particular foreign-table.
>> >   Once we try to apply ForeignScan node instead of Sort or Aggregate,
>> existing
>> >   FDW implementation needs to be improved. These nodes scan on a
>> materialized
>> >   relation (generated on the fly), however, existing FDW code assumes
>> >   ForeignScan node is always associated with a particular foreign-table.
>> >   We need to eliminate this restriction.
>>
>> I don't think we need to do that, given the above.
>>
> It makes a problem if ForeignScan is chosen as alternative path of Join.
>
> The target-list of Join node are determined according to the query form
> on the fly, so we cannot expect a particular TupleDesc to be returned
> preliminary. Once we try to apply ForeignScan instead of Join node, it
> has to have its TupleDesc depending on a set of joined relations.
>
> I think, it is more straightforward approach to allow ForeignScan that
> is not associated to a particular (cataloged) relations.

From your description, my understanding is that you would like to
stream data from 2 standard tables to the GPU, then perform a join on
the GPU itself.

I have been told that is not likely to be useful because of the data
transfer overheads.

Or did I misunderstand, and that this is intended to get around the
current lack of join pushdown into FDWs?

Can you be specific about the actual architecture you wish for, so we
can understand how to generalise that into an API?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 02:23:01

On 7 May 2014 18:39, Stephen Frost <sfrost@snowman.net> wrote:
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote:
>> > It's the optimizer's job to figure out which path to pick though, based
>> > on which will have the lowest cost.
>>
>> Of course. I'm not suggesting otherwise.
>>
>> >> If do you want that, you can write an Event Trigger that automatically
>> >> adds a lookaside for any table.
>> >
>> > This sounds terribly ugly and like we're pushing optimization decisions
>> > on to the user instead of just figuring out what the best answer is.
>>
>> I'm proposing that we use a declarative approach, just like we do when
>> we say CREATE INDEX.
>
> There's quite a few trade-offs when it comes to indexes though.  I'm
> trying to figure out when you wouldn't want to use a GPU, if it's
> available to you and the cost model says it's faster?  To me, that's
> kind of like saying you want a declarative approach for when to use a
> HashJoin.

I'm proposing something that is like an index, not like a plan node.

The reason that proposal is being made is that we need to consider
data structure, data location and processing details.

* In the case of Mat Views, if there is no Mat View, then we can't use
it - we can't replace that with just any mat view instead
* GPUs and other special processing units have finite data transfer
rates, so other people have proposed that they retain data on the
GPU/SPU - so we want to do a lookaside only for situations where the
data is already prepared to handle a lookaside.
* The other cases I cited of in-memory data structures are all
pre-arranged items with structures suited to processing particular
types of query

Given that I count 4-5 beneficial use cases for this index-like
lookaside, it seems worth investing time in.

It appears that Kaigai wishes something else in addition to this
concept, so there may be some confusion from that. I'm sure it will
take a while to really understand all the ideas and possibilities.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 02:28:51

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 8 May 2014 01:49, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >From your description, my understanding is that you would like to
> stream data from 2 standard tables to the GPU, then perform a join on
> the GPU itself.
>
> I have been told that is not likely to be useful because of the data
> transfer overheads.

That was my original understanding and, I believe, the case at one
point, however...

> Or did I misunderstand, and that this is intended to get around the
> current lack of join pushdown into FDWs?

I believe the issue with the transfer speeds to the GPU have been either
eliminated or at least reduced to the point where it's practical now.
This is all based on prior discussions with KaiGai- I've not done any
testing myself.  In any case, this is exactly what they're looking to
do, as I understand it, and to do the same with aggregates that work
well on GPUs.

> Can you be specific about the actual architecture you wish for, so we
> can understand how to generalise that into an API?

It's something that *could* be done with FDWs, once they have the
ability to have join push-down and aggregate push-down, but I (and, as I
understand it, Tom) feel isn't really the right answer for this because
the actual *data* is completely under PG in this scenario.  It's just
in-memory processing that's being done on the GPU and in the GPU's
memory.

KaiGai has speculated about other possibilities (eg: having the GPU's
memory also used as some kind of multi-query cache, which would reduce
the transfer costs, but at a level of complexity regarding that cache
that I'm not sure it'd be sensible to try and do and, in any case, could
be done later and might make sense independently, if we could make it
work for, say, a memcached environment too; I'm thinking it would be
transaction-specific, but even that would be pretty tricky unless we
held locks across every row...).
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 02:36:27

Simon,

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> I'm proposing something that is like an index, not like a plan node.
>
> The reason that proposal is being made is that we need to consider
> data structure, data location and processing details.
>
> * In the case of Mat Views, if there is no Mat View, then we can't use
> it - we can't replace that with just any mat view instead

I agree with you about MatView's.  There are clear trade-offs there,
similar to those with indexes.

> * GPUs and other special processing units have finite data transfer
> rates, so other people have proposed that they retain data on the
> GPU/SPU - so we want to do a lookaside only for situations where the
> data is already prepared to handle a lookaside.

I've heard this and I'm utterly unconvinced that it could be made to
work at all- and it's certainly moving the bar of usefullness quite far
away, making the whole thing much less practical.  If we can't cost for
this transfer rate and make use of GPUs for medium-to-large size queries
which are only transient, then perhaps shoving all GPU work out across
an FDW is actually the right solution, and make that like some kind of
MatView as you're proposing- but I don't see how you're going to manage
updates and invalidation of that data in a sane way for a multi-user PG
system.

> * The other cases I cited of in-memory data structures are all
> pre-arranged items with structures suited to processing particular
> types of query

If it's transient in-memory work, I'd like to see our generalized
optimizer consider them all instead of pushing that job on the user to
decide when the optimizer should consider certain methods.

> Given that I count 4-5 beneficial use cases for this index-like
> lookaside, it seems worth investing time in.

I'm all for making use of MatViews and GPUs, but there's more than one
way to get there and look-asides feels like pushing the decision,
unnecessarily, on to the user.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Shigeru Hanada

Date:

08 May 2014, 03:02:45

2014-05-07 18:06 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> Let me list up the things to be clarified / developed randomly.
>
> * Join replacement by FDW; We still don't have consensus about join replacement
>   by FDW. Probably, it will be designed to remote-join implementation primarily,
>   however, things to do is similar. We may need to revisit the Hanada-san's
>   proposition in the past.

I can't recall the details soon but the reason I gave up was about
introducing ForiegnJoinPath node, IIRC.  I'll revisit the discussion
and my proposal.
-- 
Shigeru HANADA

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

08 May 2014, 03:34:57

> >> > * ForeignScan node that is not associated with a particular
> foreign-table.
> >> >   Once we try to apply ForeignScan node instead of Sort or
> >> > Aggregate,
> >> existing
> >> >   FDW implementation needs to be improved. These nodes scan on a
> >> materialized
> >> >   relation (generated on the fly), however, existing FDW code assumes
> >> >   ForeignScan node is always associated with a particular
> foreign-table.
> >> >   We need to eliminate this restriction.
> >>
> >> I don't think we need to do that, given the above.
> >>
> > It makes a problem if ForeignScan is chosen as alternative path of Join.
> >
> > The target-list of Join node are determined according to the query
> > form on the fly, so we cannot expect a particular TupleDesc to be
> > returned preliminary. Once we try to apply ForeignScan instead of Join
> > node, it has to have its TupleDesc depending on a set of joined relations.
> >
> > I think, it is more straightforward approach to allow ForeignScan that
> > is not associated to a particular (cataloged) relations.
> 
> From your description, my understanding is that you would like to stream
> data from 2 standard tables to the GPU, then perform a join on the GPU itself.
> 
> I have been told that is not likely to be useful because of the data transfer
> overheads.
> 
Here are two solutions. One is currently I'm working; in case when number
of rows in left- and right- tables are not balanced well, we can keep a hash
table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from
the other side. Kernel execution and data transfer can be run asynchronously,
so it allows to hide data transfer cost as long as we have enough number of
chunks, like processor pipelining.
Other solution is "integrated" GPU that kills necessity of data transfer,
like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are
moving to same direction.

> Or did I misunderstand, and that this is intended to get around the current
> lack of join pushdown into FDWs?
> 
The logic above is obviously executed on the extension side, so it needs
ForeignScan node to perform like Join node; that reads two input relation
streams and output one joined relation stream.

It is quite similar to expected FDW join-pushdown design. It will consume
(remote) two relations and generates one output stream; looks like a scan
on a particular relation (but no catalog definition here).

Probably, it shall be visible to local backend as follows:
(it is a result of previous prototype based on custom-plan api)

postgres=# EXPLAIN VERBOSE SELECT count(*) FROM   pgbench1_branches b JOIN pgbench1_accounts a ON a.bid = b.bid WHERE
aid< 100;                                                                  QUERY PLAN
                                
 

-------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=101.60..101.61 rows=1 width=0)  Output: count(*)  ->  Custom Scan (postgres-fdw)  (cost=100.00..101.43 rows=71
width=0)       Remote SQL: SELECT NULL FROM (public.pgbench_branches r1 JOIN public.pgbench_accounts r2 ON ((r1.bid =
r2.bid)))WHERE ((r2.aid < 100))
 
(4 rows)

The place of "Custom Scan" node will be ForeignScan, if Join pushdown got supported.
At that time, what relation should be scanned by this ForeignScan?
It is the reason why I proposed ForeignScan node without particular relation.

> Can you be specific about the actual architecture you wish for, so we can
> understand how to generalise that into an API?
> 
If we push the role of CustomPlan node into ForeignScan, I want to use this node
to acquire control during query planning/execution.

As I did in the custom-plan patch, first of all, I want extension to have
a chance to add alternative path towards particular scan/join.
If extension can take over the execution, it will generate a ForeignPath
(or CustomPath) node then call add_path(). As usual manner, planner decide
whether the alternative path is cheaper than other candidates.

In case when it replaced scan relation by ForeignScan, it is almost same as
existing API doing, except for the underlying relation is regular one, not
foreign table.

In case when it replaced join relations by ForeignScan, it will be almost
same as expected ForeignScan with join-pushed down. Unlike usual table scan,
it does not have actual relation definition on catalog, and its result
tuple-slot is determined on the fly.
One thing different from the remote-join is, this ForeignScan node may have
sub-plans locally, if FDW driver (e.g GPU execution) may have capability on
Join only, but no relation scan portion.
So, unlike its naming, I want ForeignScan to support to have sub-plans if
FDW driver supports the capability.

Does it make you clear? Or, makes you more confused??

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 09:51:32

On 8 May 2014 04:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

>> From your description, my understanding is that you would like to stream
>> data from 2 standard tables to the GPU, then perform a join on the GPU itself.
>>
>> I have been told that is not likely to be useful because of the data transfer
>> overheads.
>>
> Here are two solutions. One is currently I'm working; in case when number
> of rows in left- and right- tables are not balanced well, we can keep a hash
> table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from
> the other side. Kernel execution and data transfer can be run asynchronously,
> so it allows to hide data transfer cost as long as we have enough number of
> chunks, like processor pipelining.

Makes sense to me, thanks for explaining.

The hardware-enhanced hash join sounds like a great idea.

My understanding is we would need

* a custom cost-model
* a custom execution node

The main question seems to be whether doing that would be allowable,
cos its certainly doable.

I'm still looking for a way to avoid adding planning time for all
queries though.

> Other solution is "integrated" GPU that kills necessity of data transfer,
> like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are
> moving to same direction.

Sounds useful, but very non-specific, as yet.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 11:03:08

On 8 May 2014 03:36, Stephen Frost <sfrost@snowman.net> wrote:

>> Given that I count 4-5 beneficial use cases for this index-like
>> lookaside, it seems worth investing time in.
>
> I'm all for making use of MatViews and GPUs, but there's more than one
> way to get there and look-asides feels like pushing the decision,
> unnecessarily, on to the user.

I'm not sure I understand where most of your comments come from, so
its clear we're not talking about the same things yet.


We have multiple use cases where an alternate data structure could be
used to speed up queries.

My goal is to use the alternate data structure(s)

1) if the data structure contains matching data for the current query
2) only when the user has explicitly stated it would be correct to do
so, and they wish it
3) transparently to the application, rather than forcing them to recode
4) after fully considering cost-based optimization, which we can only
do if it is transparent

all of which is how mat views work in other DBMS. My additional requirement is

5) allow this to work with data structures outside the normal
heap/index/block structures, since we have multiple already working
examples of such things and many users wish to leverage those in their
applications

which I now understand is different from the main thrust of Kaigai's
proposal, so I will restate this later on another thread.


The requirement is similar to the idea of running

CREATE MATERIALIZED VIEW foo  BUILD DEFERRED  REFRESH COMPLETE  ON DEMAND  ENABLE QUERY REWRITE  ON PREBUILT TABLE

but expands on that to encompass any external data structure.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

08 May 2014, 12:16:07

On Wed, May 7, 2014 at 4:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Agreed. My proposal is that if the planner allows the lookaside to an
> FDW then we pass the query for full execution on the FDW. That means
> that the scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW
>
> Or put another way, if we add Lookaside then we can just plug in the
> pgstrom FDW directly and we're done. And everybody else's FDW will
> work as well, so Citus etcc will not need to recode.

As Stephen notes downthread, Tom has already expressed opposition to
this idea on other threads, and I tend to agree with him, at least to
some degree.  I think the drive to use foreign data wrappers for
PGStrom, CitusDB, and other things that aren't really foreign data
wrappers as originally conceived is a result of the fact that we've
got only one interface in this area that looks remotely like something
pluggable; and so everyone's trying to fit things into the constraints
of that interface whether it's actually a good fit or not.
Unfortunately, I think what CitusDB really wants is pluggable storage,
and what PGStrom really wants is custom paths, and I don't think
either of those things is the same as what FDWs provide.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 12:48:57

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 8 May 2014 03:36, Stephen Frost <sfrost@snowman.net> wrote:
> > I'm all for making use of MatViews and GPUs, but there's more than one
> > way to get there and look-asides feels like pushing the decision,
> > unnecessarily, on to the user.
>
> I'm not sure I understand where most of your comments come from, so
> its clear we're not talking about the same things yet.
>
> We have multiple use cases where an alternate data structure could be
> used to speed up queries.

I don't view on-GPU memory as being an alternate *permanent* data store.
Perhaps that's the disconnect that we have here, as it was my
understanding that we're talking about using GPUs to make queries run
faster where the data comes from regular tables.

> My goal is to use the alternate data structure(s)

Pluggable storage is certainly interesting, but I view that as
independent of the CustomPlan-related work.

> which I now understand is different from the main thrust of Kaigai's
> proposal, so I will restate this later on another thread.

Sounds good.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 13:01:19

* Robert Haas (robertmhaas@gmail.com) wrote:
> As Stephen notes downthread, Tom has already expressed opposition to
> this idea on other threads, and I tend to agree with him, at least to
> some degree.  I think the drive to use foreign data wrappers for
> PGStrom, CitusDB, and other things that aren't really foreign data
> wrappers as originally conceived is a result of the fact that we've
> got only one interface in this area that looks remotely like something
> pluggable; and so everyone's trying to fit things into the constraints
> of that interface whether it's actually a good fit or not.

Agreed.

> Unfortunately, I think what CitusDB really wants is pluggable storage,
> and what PGStrom really wants is custom paths, and I don't think
> either of those things is the same as what FDWs provide.

I'm not entirely sure that PGStrom even really "wants" custom paths..  I
believe the goal there is to be able to use GPUs to do work for us and
custom paths/pluggable plan/execution are seen as the way to do that and
not depend on libraries which are under GPL, LGPL or other licenses which
we'd object to depending on from core.

Personally, I'd love to just see CUDA or whatever support in core as a
configure option and be able to detect at start-up when the right
libraries and hardware are available and enable the join types which
could make use of that gear.

I don't like that we're doing all of this because of licenses or
whatever and would still hope to figure out a way to address those
issues but I haven't had time to go research it myself and evidently
KaiGai and others see the issues there as insurmountable, so they're
trying to work around it by creating a pluggable interface where an
extension could provide those join types.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 13:18:48

On 8 May 2014 13:48, Stephen Frost <sfrost@snowman.net> wrote:

>> We have multiple use cases where an alternate data structure could be
>> used to speed up queries.
>
> I don't view on-GPU memory as being an alternate *permanent* data store.

As I've said, others have expressed an interest in placing specific
data on specific external resources that we would like to use to speed
up queries. That might be termed a "cache" of various kinds or it
might be simply be an allocation of that resource to a specific
purpose.

If we forget GPUs, that leaves multiple use cases that do fit the description.

> Perhaps that's the disconnect that we have here, as it was my
> understanding that we're talking about using GPUs to make queries run
> faster where the data comes from regular tables.

I'm trying to consider a group of use cases, so we get a generic API
that is useful to many people, not just to one use case. I had
understood the argument to be there must be multiple potential users
of an API before we allow it.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 13:26:52

On 8 May 2014 04:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> In case when it replaced join relations by ForeignScan, it will be almost
> same as expected ForeignScan with join-pushed down. Unlike usual table scan,
> it does not have actual relation definition on catalog, and its result
> tuple-slot is determined on the fly.
> One thing different from the remote-join is, this ForeignScan node may have
> sub-plans locally, if FDW driver (e.g GPU execution) may have capability on
> Join only, but no relation scan portion.
> So, unlike its naming, I want ForeignScan to support to have sub-plans if
> FDW driver supports the capability.

From here, it looks exactly like pushing a join into an FDW. If we had
that, we wouldn't need Custom Scan at all.

I may be mistaken and there is a critical difference. Local sub-plans
doesn't sound like a big difference.


Have we considered having an Optimizer and Executor plugin that does
this without touching core at all?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 13:32:17

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 8 May 2014 13:48, Stephen Frost <sfrost@snowman.net> wrote:
> > I don't view on-GPU memory as being an alternate *permanent* data store.
>
> As I've said, others have expressed an interest in placing specific
> data on specific external resources that we would like to use to speed
> up queries. That might be termed a "cache" of various kinds or it
> might be simply be an allocation of that resource to a specific
> purpose.

I don't think some generalized structure that addresses the goals of
FDWs, CustomPaths, MatViews and query cacheing is going to be workable
and I'm definitely against having to specify at a per-relation level
when I want certain join types to be considered.

> > Perhaps that's the disconnect that we have here, as it was my
> > understanding that we're talking about using GPUs to make queries run
> > faster where the data comes from regular tables.
>
> I'm trying to consider a group of use cases, so we get a generic API
> that is useful to many people, not just to one use case. I had
> understood the argument to be there must be multiple potential users
> of an API before we allow it.

The API you've outlined requires users to specify on a per-relation
basis what join types are valid.  As for if CustomPlans, there's
certainly potential for many use-cases there beyond just GPUs.  What I'm
unsure about is if any others would actually need to be implemented
externally as the GPU-related work seems to need or if we would just
implement those other join types in core.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

08 May 2014, 13:35:29

> On Wed, May 7, 2014 at 4:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Agreed. My proposal is that if the planner allows the lookaside to an
> > FDW then we pass the query for full execution on the FDW. That means
> > that the scan, aggregate and join could take place via the FDW. i.e.
> > "Custom Plan" == lookaside + FDW
> >
> > Or put another way, if we add Lookaside then we can just plug in the
> > pgstrom FDW directly and we're done. And everybody else's FDW will
> > work as well, so Citus etcc will not need to recode.
> 
> As Stephen notes downthread, Tom has already expressed opposition to this
> idea on other threads, and I tend to agree with him, at least to some degree.
> I think the drive to use foreign data wrappers for PGStrom, CitusDB, and
> other things that aren't really foreign data wrappers as originally
> conceived is a result of the fact that we've got only one interface in this
> area that looks remotely like something pluggable; and so everyone's trying
> to fit things into the constraints of that interface whether it's actually
> a good fit or not.
> Unfortunately, I think what CitusDB really wants is pluggable storage, and
> what PGStrom really wants is custom paths, and I don't think either of those
> things is the same as what FDWs provide.
> 
Yes, what PGStrom really needs is a custom paths; that allows extension to
replace a part of built-in nodes according to extension's characteristics.
The discussion upthread clarified that FDW needs to be enhanced to support
functionality that PGStrom wants to provide, however, some of them also needs
redefinition of FDW, indeed.

Umm... I'm now missing the direction towards my goal.
What approach is the best way to glue PostgreSQL and PGStrom?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 13:40:42

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> >From here, it looks exactly like pushing a join into an FDW. If we had
> that, we wouldn't need Custom Scan at all.
>
> I may be mistaken and there is a critical difference. Local sub-plans
> doesn't sound like a big difference.

Erm.  I'm not sure that you're really thinking through what you're
suggesting.

Allow me to re-state your suggestion here:

An FDW is loaded which provides hook for join push-down (whatever those
end up being).

A query is run which joins *local* table A to *local* table B.  Standard
heaps, standard indexes, all local to this PG instance.

The FDW which supports join push-down is then passed this join for
planning, with local sub-plans for the local tables.

> Have we considered having an Optimizer and Executor plugin that does
> this without touching core at all?

Uh, isn't that what we're talking about?  The issue is that there's a
bunch of internal functions that such a plugin would need to either have
access to or re-implement, but we'd rather not expose those internal
functions to the whole world because they're, uh, internal helper
routines, essentially, which could disappear in another release.

The point is that there isn't a good API for this today and what's being
proposed isn't a good API, it's just bolted-on to the existing system by
exposing what are rightfully internal routines.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 13:41:47

On 8 May 2014 14:32, Stephen Frost <sfrost@snowman.net> wrote:

> The API you've outlined requires users to specify on a per-relation
> basis what join types are valid.

No, it doesn't. I've not said or implied that at any point.

If you keep telling me what I mean, rather than asking, we won't get anywhere.

I think that's as far as we'll get on email.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 13:49:08

Simon,

Perhaps you've changed your proposal wrt LOOKASIDES's and I've missed it
somewhere in the thread, but this is what I was referring to with my
concerns regarding per-relation definition of 'LOOKASIDES':

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> Roughly, I'm thinking of this...
>
> CREATE LOOKASIDE ON foo
>    TO foo_mat_view;
>
> and also this...
>
> CREATE LOOKASIDE ON foo
>    TO foo_as_a_foreign_table   /* e.g. PGStrom */

where I took 'foo' to mean 'a relation'.

Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same
vein, though there I agree that we need it per-relation as there are
other trade-offs to consider (storage costs of the matview, cost to
maintain the matview, etc, similar to indexes).

The PGStrom proposal, aiui, is to add a new join type which supports
using a GPU to answer a query where all the data is in regular PG
tables.  I'd like that to "just work" when a GPU is available (perhaps
modulo having to install some extension), for any join which is costed
to be cheaper/faster when done that way.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 14:01:36

On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote:

> Allow me to re-state your suggestion here:
>
> An FDW is loaded which provides hook for join push-down (whatever those
> end up being).
>
> A query is run which joins *local* table A to *local* table B.  Standard
> heaps, standard indexes, all local to this PG instance.
>
> The FDW which supports join push-down is then passed this join for
> planning, with local sub-plans for the local tables.

Yes that is correct; thank you for confirming your understanding with me.

That also supports custom join of local to non-local table, or custom
join of two non-local tables.

If we can use interfaces that already exist with efficiency, why
invent a new one?


>> Have we considered having an Optimizer and Executor plugin that does
>> this without touching core at all?
>
> Uh, isn't that what we're talking about?

No. I meant writing this as an extension rather than a patch on core.

> The issue is that there's a
> bunch of internal functions that such a plugin would need to either have
> access to or re-implement, but we'd rather not expose those internal
> functions to the whole world because they're, uh, internal helper
> routines, essentially, which could disappear in another release.
>
> The point is that there isn't a good API for this today and what's being
> proposed isn't a good API, it's just bolted-on to the existing system by
> exposing what are rightfully internal routines.

I think the main point is that people don't want to ask for our
permission before they do what they want to do.

We either help people use Postgres, or they go elsewhere.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 14:03:54

On 8 May 2014 14:49, Stephen Frost <sfrost@snowman.net> wrote:

> Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same
> vein, though there I agree that we need it per-relation as there are
> other trade-offs to consider (storage costs of the matview, cost to
> maintain the matview, etc, similar to indexes).
>
> The PGStrom proposal, aiui, is to add a new join type which supports
> using a GPU to answer a query where all the data is in regular PG
> tables.  I'd like that to "just work" when a GPU is available (perhaps
> modulo having to install some extension), for any join which is costed
> to be cheaper/faster when done that way.

All correct and agreed. As I explained earlier, lets cover the join
requirement here and we can discuss lookasides to data structures at
Pgcon.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 14:25:29

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote:
> > Allow me to re-state your suggestion here:
> >
> > An FDW is loaded which provides hook for join push-down (whatever those
> > end up being).
> >
> > A query is run which joins *local* table A to *local* table B.  Standard
> > heaps, standard indexes, all local to this PG instance.
> >
> > The FDW which supports join push-down is then passed this join for
> > planning, with local sub-plans for the local tables.
>
> Yes that is correct; thank you for confirming your understanding with me.

I guess for my part, that doesn't look like an FDW any more.

> That also supports custom join of local to non-local table, or custom
> join of two non-local tables.

Well, we already support these, technically, but the FDW
doesn't actually implement the join, it's done in core.

> If we can use interfaces that already exist with efficiency, why
> invent a new one?

Perhaps once we have a proposal for FDW join push-down this will make
sense, but I'm not seeing it right now.

> >> Have we considered having an Optimizer and Executor plugin that does
> >> this without touching core at all?
> >
> > Uh, isn't that what we're talking about?
>
> No. I meant writing this as an extension rather than a patch on core.

KaiGai's patches have been some changes to core and then an extension
which uses those changes.  The changes to core include exposing internal
functions for extensions to use, which will undoubtably end up being a
sore spot and fragile.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 15:46:20

On 8 May 2014 15:25, Stephen Frost <sfrost@snowman.net> wrote:
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote:
>> > Allow me to re-state your suggestion here:
>> >
>> > An FDW is loaded which provides hook for join push-down (whatever those
>> > end up being).
>> >
>> > A query is run which joins *local* table A to *local* table B.  Standard
>> > heaps, standard indexes, all local to this PG instance.
>> >
>> > The FDW which supports join push-down is then passed this join for
>> > planning, with local sub-plans for the local tables.
>>
>> Yes that is correct; thank you for confirming your understanding with me.
>
> I guess for my part, that doesn't look like an FDW any more.

If it works, it works. If it doesn't, we can act otherwise.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 16:07:03

On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> (1) DDL support and system catalog
>
> Simon suggested that DDL command should be supported to track custom-
> plan providers being installed, and to avoid nonsense hook calls
> if it is an obvious case that custom-plan provider can help. It also
> makes sense to give a chance to load extensions once installed.
> (In the previous design, I assumed modules are loaded by LOAD command
> or *_preload_libraries parameters).

I've tried hard to bend my mind to this and its beginning to sink in.

We've already got pg_am for indexes, and soon to have pg_seqam for sequences.

It would seem normal and natural to have

* pg_joinam catalog table for "join methods" with a join method API
Which would include some way of defining which operators/datatypes we
consider this for, so if PostGIS people come up with some fancy GIS
join thing, we don't invoke it every time even when its inapplicable.
I would prefer it if PostgreSQL also had some way to control when the
joinam was called, possibly with some kind of table_size_threshold on
the AM tuple, which could be set to >=0 to control when this was even
considered.

* pg_scanam catalog table for "scan methods" with a scan method API
Again, a list of operators that can be used with it, like indexes and
operator classes

By analogy to existing mechanisms, we would want

* A USERSET mechanism to allow users to turn it off for testing or
otherwise, at user, database level

We would also want

* A startup call that allows us to confirm it is available and working
correctly, possibly with some self-test for hardware, performance
confirmation/derivation of planning parameters

* Some kind of trace mode that would allow people to confirm the
outcome of calls

* Some interface to the stats system so we could track the frequency
of usage of each join/scan type. This would be done within Postgres,
tracking the calls by name, rather than trusting the plugin to do it
for us

> I tried to implement the following syntax:
>
>   CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>;

Not sure if we need that yet

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

08 May 2014, 19:10:59

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> It would seem normal and natural to have
>
> * pg_joinam catalog table for "join methods" with a join method API
> Which would include some way of defining which operators/datatypes we
> consider this for, so if PostGIS people come up with some fancy GIS
> join thing, we don't invoke it every time even when its inapplicable.
> I would prefer it if PostgreSQL also had some way to control when the
> joinam was called, possibly with some kind of table_size_threshold on
> the AM tuple, which could be set to >=0 to control when this was even
> considered.

It seems useful to think about how we would redefine our existing join
methods using such a structure.  While thinking about that, it seems
like we would worry more about what the operators provide rather than
the specific operators themselves (ala hashing / HashJoin) and I'm not
sure we really care about the data types directly- just about the
operations which we can do on them..

I can see a case for sticking data types into this if we feel that we
have to constrain the path possibilities for some reason, but I'd rather
try and deal with any issues around "it doesn't make sense to do X
because we'll know it'll be really expensive" through the cost model
instead of with a table that defines what's allowed or not allowed.
There may be cases where we get the costing wrong and it's valuable
to be able to tweak cost values on a per-connection basis or for
individual queries.

I don't mean to imply that a 'pg_joinam' table is a bad idea, just that
I'd think of it being defined in terms of what capabilities it requires
of operators and a way for costing to be calculated for it, plus the
actual functions which it provides to implement the join itself (to
include some way to get output suitable for explain, etc..).

> * pg_scanam catalog table for "scan methods" with a scan method API
> Again, a list of operators that can be used with it, like indexes and
> operator classes

Ditto for this- but there's lots of other things this makes me wonder
about because it's essentially trying to define a pluggable storage
layer, which is great, but also requires some way to deal with all of
things we use our storage system for: cacheing / shared buffers,
locking, visibility, WAL, unique identifier / ctid (for use in indexes,
etc)...

> By analogy to existing mechanisms, we would want
>
> * A USERSET mechanism to allow users to turn it off for testing or
> otherwise, at user, database level

If we re-implement our existing components through this ("eat our own
dogfood" as it were), I'm not sure that we'd be able to have a way to
turn it on/off..  I realize we wouldn't have to, but then it seems like
we'd have two very different code paths and likely a different level of
support / capability afforded to "external" storage systems and then I
wonder if we're not back to just FDWs again..

> We would also want
>
> * A startup call that allows us to confirm it is available and working
> correctly, possibly with some self-test for hardware, performance
> confirmation/derivation of planning parameters

Yeah, we'd need this for anything that supports a GPU, regardless of how
we implement it, I'd think.

> * Some kind of trace mode that would allow people to confirm the
> outcome of calls

Seems like this would be useful independently of the rest..

> * Some interface to the stats system so we could track the frequency
> of usage of each join/scan type. This would be done within Postgres,
> tracking the calls by name, rather than trusting the plugin to do it
> for us

This is definitely something I want for core already...
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

08 May 2014, 19:40:44

On Thu, May 8, 2014 at 3:10 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> It would seem normal and natural to have
>>
>> * pg_joinam catalog table for "join methods" with a join method API
>> Which would include some way of defining which operators/datatypes we
>> consider this for, so if PostGIS people come up with some fancy GIS
>> join thing, we don't invoke it every time even when its inapplicable.
>> I would prefer it if PostgreSQL also had some way to control when the
>> joinam was called, possibly with some kind of table_size_threshold on
>> the AM tuple, which could be set to >=0 to control when this was even
>> considered.
>
> It seems useful to think about how we would redefine our existing join
> methods using such a structure.  While thinking about that, it seems
> like we would worry more about what the operators provide rather than
> the specific operators themselves (ala hashing / HashJoin) and I'm not
> sure we really care about the data types directly- just about the
> operations which we can do on them..

I'm pretty skeptical about this whole line of inquiry.  We've only got
three kinds of joins, and each one of them has quite a bit of bespoke
logic, and all of this code is pretty performance-sensitive on large
join nests.  If there's a way to make this work for KaiGai's use case
at all, I suspect something really lightweight like a hook, which
should have negligible impact on other workloads, is a better fit than
something involving system catalog access.  But I might be wrong.

I also think that there are really two separate problems here: getting
the executor to call a custom scan node when it shows up in the plan
tree; and figuring out how to get it into the plan tree in the first
place.  I'm not sure we've properly separated those problems, and I'm
not sure into which category the issues that sunk KaiGai's 9.4 patch
fell.  Most of this discussion seems like it's about the latter
problem, but we need to solve both.  For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted.  Then, we could later work on adding more
ways to inject them in more places.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 20:06:42

On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:

> For my money, we'd be better off
> getting some kind of basic custom scan node functionality committed
> first, even if the cases where you can actually inject them into real
> plans are highly restricted.  Then, we could later work on adding more
> ways to inject them in more places.

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

At the moment the Custom join hook is called every time we attempt to
cost a join, with no restriction.

I would like to highly restrict this, so that we only consider a
CustomJoin node when we have previously said one might be usable and
the user has requested this (e.g. enable_foojoin = on)

We only consider merge joins if the join uses operators with oprcanmerge=true.
We only consider hash joins if the join uses operators with oprcanhash=true

So it seems reasonable to have a way to define/declare what is
possible and what is not. But my take is that adding a new column to
pg_operator for every CustomJoin node is probably out of the question,
hence my suggestion to list the operators we know it can work with.

Given that everything else in Postgres is agnostic and configurable,
I'm looking to do the same here.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 20:13:36

On 8 May 2014 20:10, Stephen Frost <sfrost@snowman.net> wrote:

>> * A USERSET mechanism to allow users to turn it off for testing or
>> otherwise, at user, database level
>
> If we re-implement our existing components through this ("eat our own
> dogfood" as it were), I'm not sure that we'd be able to have a way to
> turn it on/off..  I realize we wouldn't have to, but then it seems like
> we'd have two very different code paths and likely a different level of
> support / capability afforded to "external" storage systems and then I
> wonder if we're not back to just FDWs again..

We have SET enable_hashjoin = on | off

I would like a way to do the equivalent of SET enable_mycustomjoin =
off so that when it starts behaving weirdly in production, I can turn
it off so we can prove that is not the casue, or keep it turned off if
its a problem. I don't want to have to call a hook and let the hook
decide whether it can be turned off or not.

Postgres should be in control of the plugin, not give control to the
plugin every time and hope it gives us control back.

(I'm trying to take the "FDW isn't the right way" line of thinking to
its logical conclusions, so we can decide).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

08 May 2014, 20:43:52

Robert Haas <robertmhaas@gmail.com> writes:
> I'm pretty skeptical about this whole line of inquiry.  We've only got
> three kinds of joins, and each one of them has quite a bit of bespoke
> logic, and all of this code is pretty performance-sensitive on large
> join nests.  If there's a way to make this work for KaiGai's use case
> at all, I suspect something really lightweight like a hook, which
> should have negligible impact on other workloads, is a better fit than
> something involving system catalog access.  But I might be wrong.

We do a great deal of catalog consultation already during planning,
so I think a few more wouldn't be a problem, especially if the planner
is smart enough to touch the catalogs just once (per query?) and cache
the results.  However, your point about lots of bespoke logic is dead
on, and I'm afraid it's damn near a fatal objection.  As just one example,
if we did not have merge joins then an awful lot of what the planner does
with path keys simply wouldn't exist, or at least would look a lot
different than it does.  Without that infrastructure, I can't imagine
that a plugin approach would be able to plan mergejoins anywhere near as
effectively.  Maybe there's a way around this issue, but it sure won't
just be a pg_am-like API.

> I also think that there are really two separate problems here: getting
> the executor to call a custom scan node when it shows up in the plan
> tree; and figuring out how to get it into the plan tree in the first
> place.  I'm not sure we've properly separated those problems, and I'm
> not sure into which category the issues that sunk KaiGai's 9.4 patch
> fell.

I thought that the executor side of his patch wasn't in bad shape.  The
real problems were in the planner, and indeed largely in the "backend"
part of the planner where there's a lot of hard-wired logic for fixing up
low-level details of the constructed plan tree.  It seems like in
principle it might be possible to make that logic cleanly extensible,
but it'll likely take a major rewrite.  The patch tried to skate by with
just exposing a bunch of internal functions, which I don't think is a
maintainable approach, either for the core or for the extensions using it.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

08 May 2014, 20:55:25

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:
>> For my money, we'd be better off
>> getting some kind of basic custom scan node functionality committed
>> first, even if the cases where you can actually inject them into real
>> plans are highly restricted.  Then, we could later work on adding more
>> ways to inject them in more places.

> We're past the prototyping stage and into productionising what we know
> works, AFAIK. If that point is not clear, then we need to discuss that
> first.

OK, I'll bite: what here do we know works?  Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and speculation
that's not supported by a lot of evidence.  If you think this isn't
prototyping, I wonder what you think *is* prototyping.

It seems likely to me that our existing development process is not
terribly well suited to developing a good solution in this area.
We need to be able to try some things and throw away what doesn't
work; but the project's mindset is not conducive to throwing features
away once they've appeared in a shipped release.  And the other side
of the coin is that trying these things is not inexpensive: you have
to write some pretty serious code before you have much of a feel for
whether a planner hook API is actually any good.  So by the time
you've built something of the complexity of, say, contrib/postgres_fdw,
you don't really want to throw that away in the next major release.
And that's at the bottom end of the scale of the amount of work that'd
be needed to do anything with the sorts of interfaces we're discussing.

So I'm not real sure how we move forward.  Maybe something to brainstorm
about in Ottawa.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

08 May 2014, 21:11:19

Simon Riggs <simon@2ndQuadrant.com> writes:
> We only consider merge joins if the join uses operators with oprcanmerge=true.
> We only consider hash joins if the join uses operators with oprcanhash=true

> So it seems reasonable to have a way to define/declare what is
> possible and what is not. But my take is that adding a new column to
> pg_operator for every CustomJoin node is probably out of the question,
> hence my suggestion to list the operators we know it can work with.

For what that's worth, I'm not sure that either the oprcanmerge or
oprcanhash columns really pull their weight.  We could dispense with both
at the cost of doing some wasted lookups in pg_amop.  (Perhaps we should
replace them with a single "oprisequality" column, which would amount to
a hint that it's worth looking for hash or merge properties, or for other
equality-ish properties in future.)

So I think something comparable to an operator class is indeed a better
approach than adding more columns to pg_operator.  Other than the
connection to pg_am, you could pretty nearly just use the operator class
infrastructure as-is for a lot of operator-property things like this.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

08 May 2014, 21:24:56

On 8 May 2014 21:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> So I'm not real sure how we move forward.  Maybe something to brainstorm
> about in Ottawa.

I'm just about to go on away for a week, so that's probably the best
place to leave (me out of) the discussion until Ottawa.

I've requested some evidence this hardware route is worthwhile from my
contacts, so we'll see what we get. Presumably Kaigai has something to
share already also.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Peter Geoghegan

Date:

08 May 2014, 21:37:30

On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Umm... I'm now missing the direction towards my goal.
> What approach is the best way to glue PostgreSQL and PGStrom?

I haven't really paid any attention to PGStrom. Perhaps it's just that
I missed it, but I would find it useful if you could direct me towards
a benchmark or something like that, that demonstrates a representative
scenario in which the facilities that PGStrom offers are compelling
compared to traditional strategies already implemented in Postgres and
other systems.

If I wanted to make joins faster, personally, I would look at
opportunities to optimize our existing hash joins to take better
advantage of modern CPU characteristics. A lot of the research
suggests that it may be useful to implement techniques that take
better advantage of available memory bandwidth through techniques like
prefetching and partitioning, perhaps even (counter-intuitively) at
the expense of compute bandwidth. It's possible that it just needs to
be explained to me, but, with respect, intuitively I have a hard time
imagining that offloading joins to the GPU will help much in the
general case. Every paper on joins from the last decade talks a lot
about memory bandwidth and memory latency. Are you concerned with some
specific case that I may have missed? In what scenario might a
cost-based optimizer reasonably prefer a custom join node implemented
by PgStrom, over any of the existing join node types? It's entirely
possible that I simply missed relevant discussions here.

-- 
Peter Geoghegan

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

09 May 2014, 00:27:36

On Thu, May 8, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I thought that the executor side of his patch wasn't in bad shape.  The
> real problems were in the planner, and indeed largely in the "backend"
> part of the planner where there's a lot of hard-wired logic for fixing up
> low-level details of the constructed plan tree.  It seems like in
> principle it might be possible to make that logic cleanly extensible,
> but it'll likely take a major rewrite.  The patch tried to skate by with
> just exposing a bunch of internal functions, which I don't think is a
> maintainable approach, either for the core or for the extensions using it.

Well, I consider that somewhat good news, because I think it would be
rather nice if we could get by with solving one problem at a time, and
if the executor part is close to being well-solved, excellent.

My ignorance is probably showing here, but I guess I don't understand
why it's so hard to deal with the planner side of things.  My
perhaps-naive impression is that a Seq Scan node, or even an Index
Scan node, is not all that complicated.  If we just want to inject
some more things that behave a lot like those into various baserels, I
guess I don't understand why that's especially hard.

Now I do understand that part of what KaiGai wants to do here is
inject custom scan paths as additional paths for *joinrels*.  And I
can see why that would be somewhat more complicated.  But I also don't
see why that's got to be part of the initial commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 May 2014, 01:12:18

> > I also think that there are really two separate problems here: getting
> > the executor to call a custom scan node when it shows up in the plan
> > tree; and figuring out how to get it into the plan tree in the first
> > place.  I'm not sure we've properly separated those problems, and I'm
> > not sure into which category the issues that sunk KaiGai's 9.4 patch
> > fell.
>
> I thought that the executor side of his patch wasn't in bad shape.  The
> real problems were in the planner, and indeed largely in the "backend"
> part of the planner where there's a lot of hard-wired logic for fixing up
> low-level details of the constructed plan tree.  It seems like in principle
> it might be possible to make that logic cleanly extensible, but it'll likely
> take a major rewrite.  The patch tried to skate by with just exposing a
> bunch of internal functions, which I don't think is a maintainable approach,
> either for the core or for the extensions using it.
>
(I'm now trying to catch up the discussion last night...)

I initially intended to allow extensions to add their custom-path based
on their arbitrary decision, because the core backend cannot have
expectation towards the behavior of custom-plan.
However, of course, the custom-path that replaces built-in paths shall
have compatible behavior in spite of different implementation.

So, I'm inclined to the direction that custom-plan provider will inform
the core backend what they can do, and planner will give extensions more
practical information to construct custom path node.

Let me investigate how to handle join replacement by custom-path in the
planner stage.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 May 2014, 01:19:16

> On Thu, May 8, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I thought that the executor side of his patch wasn't in bad shape.
> > The real problems were in the planner, and indeed largely in the "backend"
> > part of the planner where there's a lot of hard-wired logic for fixing
> > up low-level details of the constructed plan tree.  It seems like in
> > principle it might be possible to make that logic cleanly extensible,
> > but it'll likely take a major rewrite.  The patch tried to skate by
> > with just exposing a bunch of internal functions, which I don't think
> > is a maintainable approach, either for the core or for the extensions
> using it.
> 
> Well, I consider that somewhat good news, because I think it would be rather
> nice if we could get by with solving one problem at a time, and if the executor
> part is close to being well-solved, excellent.
> 
> My ignorance is probably showing here, but I guess I don't understand why
> it's so hard to deal with the planner side of things.  My perhaps-naive
> impression is that a Seq Scan node, or even an Index Scan node, is not all
> that complicated.  If we just want to inject some more things that behave
> a lot like those into various baserels, I guess I don't understand why that's
> especially hard.
> 
> Now I do understand that part of what KaiGai wants to do here is inject
> custom scan paths as additional paths for *joinrels*.  And I can see why
> that would be somewhat more complicated.  But I also don't see why that's
> got to be part of the initial commit.
> 
I'd also like to take this approach. Even though we eventually need to take
a graceful approach for join replacement by custom-path, it partially makes
sense to have minimum functionality set first.

Then, we can focus on how to design planner integration for joinning.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 May 2014, 01:22:45

> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
> 
> I haven't really paid any attention to PGStrom. Perhaps it's just that I
> missed it, but I would find it useful if you could direct me towards a
> benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling compared
> to traditional strategies already implemented in Postgres and other
> systems.
> 
Implementation of Hash-Join on GPU side is still under development.

Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.

EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.


postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
                                            QUERY PLAN
 

----------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=10003175757.67..10003175757.68 rows=1 width=0) (actual time=46648.635..46648.635 rows=1 loops=1)  ->  Seq Scan on
t1 (cost=10000000000.00..10003109091.00 rows=26666667 width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
 Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double
precision)))< 10::double precision)        Rows Removed by Filter: 77486186Planning time: 0.066 msTotal runtime:
46648.668ms
 
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10;
                                               QUERY PLAN
 

-----------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=1274424.33..1274424.34 rows=1 width=0) (actual time=1784.729..1784.729 rows=1 loops=1)  ->  Custom (GpuScan) on
t1 (cost=10000.00..1207757.67 rows=26666667 width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)        Host
References:       Device References: x, y        Device Filter: (sqrt((((x - 20::double precision) ^ 2::double
precision)+ ((y - 20::double precision) ^ 2::double precision))) < 10::double precision)        Total time to load:
0.231ms        Avg time in send-mq: 0.027 ms        Max time to build kernel: 1.064 ms        Avg time of DMA send:
3.050ms        Total time of DMA send: 933.318 ms        Avg time of kernel exec: 5.117 ms        Total time of kernel
exec:1565.799 ms        Avg time of DMA recv: 0.086 ms        Total time of DMA recv: 26.289 ms        Avg time in
recv-mq:0.011 msPlanning time: 0.094 msTotal runtime: 1784.793 ms
 
(17 rows)


> If I wanted to make joins faster, personally, I would look at opportunities
> to optimize our existing hash joins to take better advantage of modern CPU
> characteristics. A lot of the research suggests that it may be useful to
> implement techniques that take better advantage of available memory
> bandwidth through techniques like prefetching and partitioning, perhaps
> even (counter-intuitively) at the expense of compute bandwidth. It's
> possible that it just needs to be explained to me, but, with respect,
> intuitively I have a hard time imagining that offloading joins to the GPU
> will help much in the general case. Every paper on joins from the last decade
> talks a lot about memory bandwidth and memory latency. Are you concerned
> with some specific case that I may have missed? In what scenario might a
> cost-based optimizer reasonably prefer a custom join node implemented by
> PgStrom, over any of the existing join node types? It's entirely possible
> that I simply missed relevant discussions here.
> 
If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 01:40:25

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:
> > For my money, we'd be better off
> > getting some kind of basic custom scan node functionality committed
> > first, even if the cases where you can actually inject them into real
> > plans are highly restricted.  Then, we could later work on adding more
> > ways to inject them in more places.
>
> We're past the prototyping stage and into productionising what we know
> works, AFAIK. If that point is not clear, then we need to discuss that
> first.
>
> At the moment the Custom join hook is called every time we attempt to
> cost a join, with no restriction.
>
> I would like to highly restrict this, so that we only consider a
> CustomJoin node when we have previously said one might be usable and
> the user has requested this (e.g. enable_foojoin = on)

This is part of what I disagree with- I'd rather not require users to
know and understand when they want to do a HashJoin vs. a MergeJoin vs.
a CustomJoinTypeX.

> We only consider merge joins if the join uses operators with oprcanmerge=true.
> We only consider hash joins if the join uses operators with oprcanhash=true

I wouldn't consider those generally "user-facing" options, and the
enable_X counterparts are intended for debugging and not to be used in
production environments.  To the point you make in the other thread- I'm
fine w/ having similar cost-based enable_X options, but I think we're
doing our users a disservice if we require that they populate or update
a table.  Perhaps an extension needs to do that on installation, but
that would need to enable everything to avoid the user having to mess
around with the table.

> So it seems reasonable to have a way to define/declare what is
> possible and what is not. But my take is that adding a new column to
> pg_operator for every CustomJoin node is probably out of the question,
> hence my suggestion to list the operators we know it can work with.

It does seem like there should be some work done in this area, as Tom
mentioned, to avoid needing to have more columns to track how equality
can be done.  I do wonder just how we'd deal with this when it comes to
GPUs as, presumably, the code to implement the equality for various
types would have to be written in CUDA-or-whatever.

> Given that everything else in Postgres is agnostic and configurable,
> I'm looking to do the same here.

It's certainly a neat idea, but I do have concerns (which appear to be
shared by others) about just how practical it'll be and how much rework
it'd take and the question about if it'd really be used in the end..
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 May 2014, 02:00:09

> > So it seems reasonable to have a way to define/declare what is
> > possible and what is not. But my take is that adding a new column to
> > pg_operator for every CustomJoin node is probably out of the question,
> > hence my suggestion to list the operators we know it can work with.
>
> It does seem like there should be some work done in this area, as Tom mentioned,
> to avoid needing to have more columns to track how equality can be done.
> I do wonder just how we'd deal with this when it comes to GPUs as, presumably,
> the code to implement the equality for various types would have to be written
> in CUDA-or-whatever.
>
GPU has workloads likes and dislikes. It is a reasonable idea to list up
operators (or something else) that have advantage to run on custom-path.
For example, numeric calculation on fixed-length variables has greate
advantage on GPU, but locale aware text matching is not a workload suitable
to GPUs.
It may be a good hint for planner to pick up candidate paths to be considered.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 02:13:31

* Peter Geoghegan (pg@heroku.com) wrote:
> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
>
> I haven't really paid any attention to PGStrom. Perhaps it's just that
> I missed it, but I would find it useful if you could direct me towards
> a benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling
> compared to traditional strategies already implemented in Postgres and
> other systems.

I agree that some concrete evidence would be really nice.  I
more-or-less took KaiGai's word on it, but having actual benchmarks
would certainly be better.

> If I wanted to make joins faster, personally, I would look at
> opportunities to optimize our existing hash joins to take better
> advantage of modern CPU characteristics.

Yeah, I'm pretty confident we're leaving a fair bit on the table right
there based on my previous investigation into this area.  There were
easily cases which showed a 3x improvement, as I recall (the trade-off
being increased memory usage for a larger, sparser hash table).  Sadly,
there were also cases which ended up being worse and it seemed to be
very sensetive to the size of the hash table which ends up being built
and the size of the on-CPU cache.

> A lot of the research
> suggests that it may be useful to implement techniques that take
> better advantage of available memory bandwidth through techniques like
> prefetching and partitioning, perhaps even (counter-intuitively) at
> the expense of compute bandwidth.

While I agree with this, one of the big things about GPUs is that they
operate in a highly parallel fashion and across a different CPU/Memory
architecture than what we're used to (for starters, everything is much
"closer").  In a traditional memory system, there's a lot of back and
forth to memory, but a single memory dump over to the GPU's memory where
everything is processed in a highly parallel way and then shipped back
wholesale to main memory is at least conceivably faster.

Of course, things will change when we are able to parallelize joins
across multiple CPUs ourselves..  In a way, the PGStrom approach gets to
"cheat" us today, since it can parallelize the work where core can't and
that ends up not being an entirely fair comparison.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 02:16:28

* Robert Haas (robertmhaas@gmail.com) wrote:
> Well, I consider that somewhat good news, because I think it would be
> rather nice if we could get by with solving one problem at a time, and
> if the executor part is close to being well-solved, excellent.

Sadly, I'm afraid the news really isn't all that good in the end..

> My ignorance is probably showing here, but I guess I don't understand
> why it's so hard to deal with the planner side of things.  My
> perhaps-naive impression is that a Seq Scan node, or even an Index
> Scan node, is not all that complicated.  If we just want to inject
> some more things that behave a lot like those into various baserels, I
> guess I don't understand why that's especially hard.

That's not what is being asked for here though...

> Now I do understand that part of what KaiGai wants to do here is
> inject custom scan paths as additional paths for *joinrels*.  And I
> can see why that would be somewhat more complicated.  But I also don't
> see why that's got to be part of the initial commit.

I'd say it's more than "part" of what the goal is here- it's more or
less what everything boils down to.  Oh, plus being able to replace
aggregates with a GPU-based operation instead, but that's no trivially
done thing either really (if it is, let's get it done for FDWs
already...).
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 02:18:50

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> I initially intended to allow extensions to add their custom-path based
> on their arbitrary decision, because the core backend cannot have
> expectation towards the behavior of custom-plan.
> However, of course, the custom-path that replaces built-in paths shall
> have compatible behavior in spite of different implementation.

I didn't ask this before but it's been on my mind for a while- how will
this work for custom data types, ala the 'geometry' type from PostGIS?
There's user-provided code that we have to execute to check equality for
those, but they're not giving us CUDA code to run to perform that
equality...
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 May 2014, 02:27:42

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > I initially intended to allow extensions to add their custom-path
> > based on their arbitrary decision, because the core backend cannot
> > have expectation towards the behavior of custom-plan.
> > However, of course, the custom-path that replaces built-in paths shall
> > have compatible behavior in spite of different implementation.
>
> I didn't ask this before but it's been on my mind for a while- how will
> this work for custom data types, ala the 'geometry' type from PostGIS?
> There's user-provided code that we have to execute to check equality for
> those, but they're not giving us CUDA code to run to perform that equality...
>
If custom-plan provider support the user-defined data types such as PostGIS,
it will be able to pick up these data types also, in addition to built-in
ones. It fully depends on coverage of the extension.
If not a supported data type, it is not a show-time of GPUs.

In my case, if PG-Strom can also have compatible code, but runnable on OpenCL,
of them, it will say "yes, I can handle this data type".

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 02:28:00

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> GPU has workloads likes and dislikes. It is a reasonable idea to list up
> operators (or something else) that have advantage to run on custom-path.
> For example, numeric calculation on fixed-length variables has greate
> advantage on GPU, but locale aware text matching is not a workload suitable
> to GPUs.

Right- but this points out exactly what I was trying to bring up.

Locale-aware text matching requires running libc-provided code, which
isn't going to happen on the GPU (unless we re-implement it...).
Aren't we going to have the same problem with the 'numeric' type?  Our
existing functions won't be usable on the GPU and we'd have to
re-implement them and then make darn sure that they produce the same
results...

We'll also have to worry about any cases where we have a libc function
and a CUDA function and convince ourselves that there's no difference
between the two..  Not sure exactly how we'd built this kind of
knowledge into the system through a catalog (I tend to doubt that'd
work, in fact) and trying to make it work from an extension in a way
that it would work with *other* extensions strikes me as highly
unlikely.  Perhaps the extension could provide the core types and the
other extensions could provide their own bits to hook into the right
places, but that sure seems fragile.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 02:29:28

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > I didn't ask this before but it's been on my mind for a while- how will
> > this work for custom data types, ala the 'geometry' type from PostGIS?
> > There's user-provided code that we have to execute to check equality for
> > those, but they're not giving us CUDA code to run to perform that equality...
> >
> If custom-plan provider support the user-defined data types such as PostGIS,
> it will be able to pick up these data types also, in addition to built-in
> ones. It fully depends on coverage of the extension.
> If not a supported data type, it is not a show-time of GPUs.

So the extension will need to be aware of all custom data types and then
installed *after* all other extensions are installed?  That doesn't
strike me as workable...
Thanks,    Stephen

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 May 2014, 02:34:18

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > > I didn't ask this before but it's been on my mind for a while- how
> > > will this work for custom data types, ala the 'geometry' type from
> PostGIS?
> > > There's user-provided code that we have to execute to check equality
> > > for those, but they're not giving us CUDA code to run to perform that
> equality...
> > >
> > If custom-plan provider support the user-defined data types such as
> > PostGIS, it will be able to pick up these data types also, in addition
> > to built-in ones. It fully depends on coverage of the extension.
> > If not a supported data type, it is not a show-time of GPUs.
>
> So the extension will need to be aware of all custom data types and then
> installed *after* all other extensions are installed?  That doesn't strike
> me as workable...
>
I'm not certain why do you think an extension will need to support all
the data types.
Even if it works only for a particular set of data types, it makes sense
as long as it covers data types user actually using.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Peter Geoghegan

Date:

09 May 2014, 02:39:28

On Thu, May 8, 2014 at 7:13 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Of course, things will change when we are able to parallelize joins
> across multiple CPUs ourselves..  In a way, the PGStrom approach gets to
> "cheat" us today, since it can parallelize the work where core can't and
> that ends up not being an entirely fair comparison.

I was thinking of SIMD, along similar lines. We might be able to cheat
our way out of having to solve some of the difficult problems of
parallelism that way. For example, if you can build a SIMD-friendly
bitonic mergesort, and combine that with poor man's normalized keys,
that could make merge joins on text faster. That's pure speculation,
but it seems like an interesting possibility.

-- 
Peter Geoghegan

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 02:44:01

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > So the extension will need to be aware of all custom data types and then
> > installed *after* all other extensions are installed?  That doesn't strike
> > me as workable...
> >
> I'm not certain why do you think an extension will need to support all
> the data types.

Mostly because we have a very nice extension system which quite a few
different extensions make use of and it'd be pretty darn unfortunate if
none of them could take advtange of GPUs because we decided that the
right way to support GPUs was through an extension.

This is argument which might be familiar to some as it was part of the
reason that json and jsonb were added to core, imv...

> Even if it works only for a particular set of data types, it makes sense
> as long as it covers data types user actually using.

I know quite a few users of PostGIS, ip4r, and hstore...
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

09 May 2014, 02:47:02

On Thu, May 8, 2014 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> Well, I consider that somewhat good news, because I think it would be
>> rather nice if we could get by with solving one problem at a time, and
>> if the executor part is close to being well-solved, excellent.
>
> Sadly, I'm afraid the news really isn't all that good in the end..
>
>> My ignorance is probably showing here, but I guess I don't understand
>> why it's so hard to deal with the planner side of things.  My
>> perhaps-naive impression is that a Seq Scan node, or even an Index
>> Scan node, is not all that complicated.  If we just want to inject
>> some more things that behave a lot like those into various baserels, I
>> guess I don't understand why that's especially hard.
>
> That's not what is being asked for here though...

I am not sure what your point is here.  Here's mine: if we can strip
this down to the executor support plus the most minimal planner
support possible, we might be able to get *something* committed.  Then
we can extend it in subsequent commits.

You seem to be saying there's no value in getting anything committed
unless it handles the scan-substituting-for-join case.  I don't agree.Incremental commits are good, whether they get
youall the way to

where you want to be or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

09 May 2014, 03:08:33

* Robert Haas (robertmhaas@gmail.com) wrote:
> I am not sure what your point is here.  Here's mine: if we can strip
> this down to the executor support plus the most minimal planner
> support possible, we might be able to get *something* committed.  Then
> we can extend it in subsequent commits.

I guess my point is that I see this more-or-less being solved already by
FDWs, but that doesn't address the case when it's a local table, so
perhaps there is something useful our of a commit that allows
replacement of a SeqScan node (which presumably would also be costed
differently).

> You seem to be saying there's no value in getting anything committed
> unless it handles the scan-substituting-for-join case.  I don't agree.
>  Incremental commits are good, whether they get you all the way to
> where you want to be or not.

To be honest, I think this is really the first proposal to replace
specific Nodes, rather than provide a way for a generic Node to exist
(which could also replace joins).  While I do think it's an interesting
idea, and if we could push filters down to this new Node it might even
be worthwhile, I'm not sure that it actually moves us down the path to
supporting Nodes which replace joins.

Still, I'm not against it.
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

10 May 2014, 11:21:22

On 9 May 2014 02:40, Stephen Frost <sfrost@snowman.net> wrote:
> * Simon Riggs (simon@2ndQuadrant.com) wrote:
>> On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote:
>> > For my money, we'd be better off
>> > getting some kind of basic custom scan node functionality committed
>> > first, even if the cases where you can actually inject them into real
>> > plans are highly restricted.  Then, we could later work on adding more
>> > ways to inject them in more places.
>>
>> We're past the prototyping stage and into productionising what we know
>> works, AFAIK. If that point is not clear, then we need to discuss that
>> first.
>>
>> At the moment the Custom join hook is called every time we attempt to
>> cost a join, with no restriction.
>>
>> I would like to highly restrict this, so that we only consider a
>> CustomJoin node when we have previously said one might be usable and
>> the user has requested this (e.g. enable_foojoin = on)
>
> This is part of what I disagree with- I'd rather not require users to
> know and understand when they want to do a HashJoin vs. a MergeJoin vs.
> a CustomJoinTypeX.

Again, I have *not* said users should know that.

>> We only consider merge joins if the join uses operators with oprcanmerge=true.
>> We only consider hash joins if the join uses operators with oprcanhash=true
>
> I wouldn't consider those generally "user-facing" options, and the
> enable_X counterparts are intended for debugging and not to be used in
> production environments.  To the point you make in the other thread- I'm
> fine w/ having similar cost-based enable_X options, but I think we're
> doing our users a disservice if we require that they populate or update
> a table.  Perhaps an extension needs to do that on installation, but
> that would need to enable everything to avoid the user having to mess
> around with the table.

Again, I did *not* say those should be user facing options, nor that
they be set at table-level.


What I have said is that authors of CustomJoins or CustomScans should
declare in advance via system catalogs which operators their new code
works with so that Postgres knows when it is appropriate to call them.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Stephen Frost

Date:

10 May 2014, 20:02:51

* Simon Riggs (simon@2ndQuadrant.com) wrote:
> What I have said is that authors of CustomJoins or CustomScans should
> declare in advance via system catalogs which operators their new code
> works with so that Postgres knows when it is appropriate to call them.

I guess I just took that as given, since the discussion has been about
GPUs and there will have to be new operators since there will be
different code (CUDA-or-whatever GPU-language code).
Thanks,
    Stephen

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

11 May 2014, 08:23:01

On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> We're past the prototyping stage and into productionising what we know
>> works, AFAIK. If that point is not clear, then we need to discuss that
>> first.
>
> OK, I'll bite: what here do we know works?  Not a damn thing AFAICS;
> it's all speculation that certain hooks might be useful, and speculation
> that's not supported by a lot of evidence.  If you think this isn't
> prototyping, I wonder what you think *is* prototyping.

My research contacts advise me of this recent work http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
and also that they expect a prototype to be ready by October, which I
have been told will be open source.

So there are at least two groups looking at this as a serious option
for Postgres (not including the above paper's authors).

That isn't *now*, but it is at least a time scale that fits with
acting on this in the next release, if we can separate out the various
ideas and agree we wish to proceed.

I'll submerge again...

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

12 May 2014, 01:10:24

> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> >> We're past the prototyping stage and into productionising what we
> >> know works, AFAIK. If that point is not clear, then we need to
> >> discuss that first.
> >
> > OK, I'll bite: what here do we know works?  Not a damn thing AFAICS;
> > it's all speculation that certain hooks might be useful, and
> > speculation that's not supported by a lot of evidence.  If you think
> > this isn't prototyping, I wonder what you think *is* prototyping.
> 
> My research contacts advise me of this recent work
>   http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
> and also that they expect a prototype to be ready by October, which I have
> been told will be open source.
> 
> So there are at least two groups looking at this as a serious option for
> Postgres (not including the above paper's authors).
> 
> That isn't *now*, but it is at least a time scale that fits with acting
> on this in the next release, if we can separate out the various ideas and
> agree we wish to proceed.
> 
> I'll submerge again...
> 
Through the discussion last week, our minimum consensus are:
1. Deregulated enhancement of FDW is not a way to go
2. Custom-path that can replace built-in scan makes sense as a first step  towards the future enhancement. Its planner
integrationis enough simple  to do.
 
3. Custom-path that can replace built-in join takes investigation how to  integrate existing planner structure, to
avoid(3a) reinvention of  whole of join handling in extension side, and (3b) unnecessary extension  calls towards the
caseobviously unsupported.
 

So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
on the v9.5 development cycle. Also, we will be able to have discussion
for the (3) portion concurrently, probably, towards 2nd commit-fest.

Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
us the face-to-face discussion here.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Shigeru Hanada

Date:

16 June 2014, 08:29:10

Kaigai-san,

I've just applied v1 patch, and tried build and install, but I found two issues:

1) The contrib/ctidscan is not automatically built/installed because
it's not described in contrib/Makefile.  Is this expected behavior?
2) I got an error message below when building document.

$ cd doc/src/sgml
$ make
openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
"SQL-CREATECUSTOMPLAN"
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'

I'll review another part of the patch, including the design.


2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> According to the discussion upthread, I revised the custom-plan patch
> to focus on regular relation scan but no join support right now, and to
> support DDL command to define custom-plan providers.
>
> Planner integration with custom logic to scan a particular relation is
> enough simple, unlike various join cases. It's almost similar to what
> built-in logic are doing now - custom-plan provider adds a path node
> with its cost estimation if it can offer alternative way to scan referenced
> relation. (in case of no idea, it does not need to add any paths)
>
> A new DDL syntax I'd like to propose is below:
>
>   CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
>
> <name> is as literal, put a unique identifier.
> <class> is workload type to be offered by this custom-plan provider.
> "scan" is the only option right now, that means base relation scan.
> <function_name> is also as literal; it shall perform custom-plan provider.
>
> A custom-plan provider function is assumed to take an argument of
> "internal" type to deliver a set of planner information that is needed to
> construct custom-plan pathnode.
> In case of "scan" class, pointer towards an customScanArg object
> shall be delivered on invocation of custom-plan provider.
>
>     typedef struct {
>         uint32            custom_class;
>         PlannerInfo    *root;
>         RelOptInfo     *baserel;
>         RangeTblEntry  *rte;
>     } customScanArg;
>
> In case when the custom-plan provider function being invoked thought
> it can offer an alternative scan path on the relation of "baserel", things
> to do is (1) construct a CustomPath (or its inherited data type) object
> with a table of callback function pointers (2) put its own cost estimation,
> and (3) call add_path() to register this path as an alternative one.
>
> Once the custom-path was chosen by query planner, its CreateCustomPlan
> callback is called to populate CustomPlan node based on the pathnode.
> It also has a table of callback function pointers to handle various planner's
> job in setrefs.c and so on.
>
> Similarly, its CreateCustomPlanState callback is called to populate
> CustomPlanState node based on the plannode. It also has a table of
> callback function pointers to handle various executor's job during quey
> execution.
>
> Most of callback designs are not changed from the prior proposition in
> v9.4 development cycle, however, here is a few changes.
>
> * CustomPlan became to inherit Scan, and CustomPlanState became to
>   inherit ScanState. Because some useful routines to implement scan-
>   logic, like ExecScan, expects state-node has ScanState as its base
>   type, it's more kindness for extension side. (I'd like to avoid each
>   extension reinvent ExecScan by copy & paste!)
>   I'm not sure whether it should be a union of Join in the future, however,
>   it is a reasonable choice to have compatible layout with Scan/ScanState
>   to implement alternative "scan" logic.
>
> * Exporting static functions - I still don't have a graceful answer here.
>   However, it is quite natural that extensions to follow up interface updates
>   on the future version up of PostgreSQL.
>   Probably, it shall become clear what class of functions shall be
>   exported and what class of functions shall be re-implemented within
>   extension side in the later discussion.
>   Right now, I exported minimum ones that are needed to implement
>   alternative scan method - contrib/ctidscan module.
>
> Items to be discussed later:
> * planner integration for relations join - probably, we may define new
>   custom-plan classes as alternative of hash-join, merge-join and
>   nest-loop. If core can know this custom-plan is alternative of hash-
>   join, we can utilize core code to check legality of join.
> * generic key-value style options in custom-plan definition - Hanada
>   san proposed me off-list - like foreign data wrapper. It may enable
>   to configure multiple behavior on a binary.
> * ownership and access control of custom-plan. right now, only
>   superuser can create/drop custom-plan provider definition, thus,
>   it has no explicit ownership and access control. It seems to me
>   a reasonable assumption, however, we may have a usecase that
>   needs custom-plan by unprivileged users.
>
> Thanks,
>
> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>
>>> >> We're past the prototyping stage and into productionising what we
>>> >> know works, AFAIK. If that point is not clear, then we need to
>>> >> discuss that first.
>>> >
>>> > OK, I'll bite: what here do we know works?  Not a damn thing AFAICS;
>>> > it's all speculation that certain hooks might be useful, and
>>> > speculation that's not supported by a lot of evidence.  If you think
>>> > this isn't prototyping, I wonder what you think *is* prototyping.
>>>
>>> My research contacts advise me of this recent work
>>>   http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
>>> and also that they expect a prototype to be ready by October, which I have
>>> been told will be open source.
>>>
>>> So there are at least two groups looking at this as a serious option for
>>> Postgres (not including the above paper's authors).
>>>
>>> That isn't *now*, but it is at least a time scale that fits with acting
>>> on this in the next release, if we can separate out the various ideas and
>>> agree we wish to proceed.
>>>
>>> I'll submerge again...
>>>
>> Through the discussion last week, our minimum consensus are:
>> 1. Deregulated enhancement of FDW is not a way to go
>> 2. Custom-path that can replace built-in scan makes sense as a first step
>>    towards the future enhancement. Its planner integration is enough simple
>>    to do.
>> 3. Custom-path that can replace built-in join takes investigation how to
>>    integrate existing planner structure, to avoid (3a) reinvention of
>>    whole of join handling in extension side, and (3b) unnecessary extension
>>    calls towards the case obviously unsupported.
>>
>> So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
>> on the v9.5 development cycle. Also, we will be able to have discussion
>> for the (3) portion concurrently, probably, towards 2nd commit-fest.
>>
>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
>> us the face-to-face discussion here.
>>
>> Thanks,
>> --
>> NEC OSS Promotion Center / PG-Strom Project
>> KaiGai Kohei <kaigai@ak.jp.nec.com>
>>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



-- 
Shigeru HANADA

Re: [v9.5] Custom Plan API

From

Shigeru Hanada

Date:

03 July 2014, 14:00:34

Kaigai-san,

Sorry for lagged response.

Here are my  random thoughts about the patch.  I couldn't understand
the patch fully, because some of APIs are not used by ctidscan.  If

Custom Scan patch v2 review

* Custom plan class comparison
In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
with 's'.  Do you plan to use custclass as bit field?  If so, values
for custom plan class should not be a character.  Otherwise, custclass
should be compared by == operator.

* Purpose of GetSpecialCustomVar()
The reason why FinalizeCustomPlan callback is necessary is not clear to me.
Could you show a case that the API would be useful?

* Purpose of FinalizeCustomPlan()
The reason why FinalizeCustomPlan callback is necessary is not clear
to me, because ctidscan just calls finalize_primnode() and
bms_add_members() with given information.  Could you show a case that
the API would be useful?

* Is it ok to call set_cheapest() for all relkind?
Now set_cheapest() is called not for only relation and foreign table
but also custom plan, and other relations such as subquery, function,
and value.  Calling call_custom_scan_provider() and set_cheapest() in
the case of RTE_RELATION seems similar to the old construct, how do
you think about this?

* Is it hard to get rid of CopyCustomPlan()?
Copying ForeignScan node doesn't need per-FDW copy function by
limiting fdw_private to have only copy-able objects.  Can't we use the
same way for CustomPlan?  Letting authors call NodeSetTag or
copyObject() sounds uncomfortable to me.

This would be able to apply to TextOutCustomPlan() and TextOutCustomPath() too.

* MultiExec support is appropriate for the first version?
The cases need MultiExec seems little complex for the first version of
custom scan.  What kind of plan do you image for this feature?

* Does SupportBackwardScan() have enough information?
Other scans check target list with TargetListSupportsBackwardScan().
Isn't it necessary to check it for CustomPlan too in
ExecSupportsBackwardScan()?

* Place to call custom plan provider
Is it necessary to call provider when relkind != RELKIND_RELATION?  If
yes, isn't it necessary to call for append relation?

I know that we concentrate to replacing scan in the initial version,
so it would not be a serious problem, but it would be good to consider
extensible design.

* Custom Plan Provider is "addpath"?
Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER
seems little odd.
Using handler like FDW makes the design too complex and/or messy?

* superclass of CustomPlanState
CustomPlanState derives ScanState, instead of deriving PlanState
directly.  I worry the case of non-heap-scan custom plan, but it might
be ok to postpone consideration about that at the first cut.

* Naming and granularity of objects related to custom plan
I'm not sure the current naming is appropriate, especially difference
between "custom plan" and "provider" and "handler".  In the context of
CREATE CUSTOM PLAN statement, what the term "custom plan" means?  My
impression is that "custom plan" is an alternative plan type, e.g.
ctidscan or pg_strom_scan.  Then what the term "provider" means?  My
impression for that is extension, such as ctidscan and pg_strom.  The
grammar allows users to pass function via PROVIDER clause of CREATE
CUSTOM SCAN, so the function would be the provider of the custom plan
created by the statement.

* enable_customscan
GUC parameter enable_customscan would be useful for users who want to
disable custom plan feature temporarily.  In the case of pg_strom,
using GPU for limited sessions for analytic or batch applications
seems handy.

* Adding pg_custom_plan catalog
Using "cust" as prefix for pg_custom_plan causes ambiguousness which
makes it difficult to choose catalog prefix for a feature named
"Custom Foo" in future.  How about using "cusp" (CUStom Plan)?

Or is it better to use pg_custom_plan_provider as catalog relation
name, as the document says that "CREATE CUSTOM PLAN defines custom
plan provider".  Then prefix could be "cpp" (Custom Plan Provider).
This seems to match the wording used for pg_foreign_data_wrapper.

* CREATE CUSTOM PLAN statement
This is just a question:  We need to emit CREATE CUSTOM PLAN if we want to use
I wonder how it is extended when supporting join as custom class.

* New operators about TID comparison
IMO this portion should be a separated patch, because it adds OID
definition of existing operators such as tidgt and tidle.  Is there
any (explicit or implicit) rule about defining macro for oid of an
operator?

* Prototype of get_custom_plan_oid()
custname (or cppname if use the rule I proposed above) seems
appropriate as the parameter name of get_custom_plan_oid() because
similar funcitons use catalog column names in such case.

* Coding conventions
Some lines are indented with white space.  Future pgindent run will
fix this issue?

* Unnecessary struct forward declaration
Forward declarations of CustomPathMethods, Plan, and CustomPlan in
includes/nodes/relation.h seem unncecessary.  Other headers might have
same issue.

* Unnecessary externing of replace_nestloop_params()
replace_nestloop_params() is extern-ed but it's never called outside
createplan.c.

* Externing fix_scan_expr()
If it's necessary for all custom plan providers to call fix_scan_expr
(via fix_scan_list macro), isn't it able to do it in set_plan_refs()
before calling SetCustomPlanRef()?

* What does T_CustomPlanMarkPos  mean?
It's not clear to me when CustomPlanMarkPos works.  Is it for a custom
plan provider which supports marking position and rewind to the
position, and ctidscan just lacks capability to do that, so it is not
used anywhere?

* Unnecessary changes in allpaths.c
some comment about Subquery and CTE are changed (perhaps) accidentally.

* Typos * planenr -> planner, implements -> implement in create_custom_plan.sgml * CustomScan in nodeCustom.h should be
CustomPlan?* delivered -> derived, in src/backend/optimizer/util/pathnode.c

* Document "Writing Custom Plan Provider" is not provided
Custom Plan Provider author would (and I DO!) hope documents about
writing a custom plan provider.

Regards,

2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Hanada-san,
>
> Thanks for your checks. I oversight the points when I submit the patch, sorry.
> The attached one is revised one on documentation stuff and contrib/Makefile.
>
> Thanks,
>
> 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> Kaigai-san,
>>
>> I've just applied v1 patch, and tried build and install, but I found two issues:
>>
>> 1) The contrib/ctidscan is not automatically built/installed because
>> it's not described in contrib/Makefile.  Is this expected behavior?
>> 2) I got an error message below when building document.
>>
>> $ cd doc/src/sgml
>> $ make
>> openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
>> -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
>> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
>> "SQL-CREATECUSTOMPLAN"
>> make: *** [HTML.index] Error 1
>> make: *** Deleting file `HTML.index'
>>
>> I'll review another part of the patch, including the design.
>>
>>
>> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>>> According to the discussion upthread, I revised the custom-plan patch
>>> to focus on regular relation scan but no join support right now, and to
>>> support DDL command to define custom-plan providers.
>>>
>>> Planner integration with custom logic to scan a particular relation is
>>> enough simple, unlike various join cases. It's almost similar to what
>>> built-in logic are doing now - custom-plan provider adds a path node
>>> with its cost estimation if it can offer alternative way to scan referenced
>>> relation. (in case of no idea, it does not need to add any paths)
>>>
>>> A new DDL syntax I'd like to propose is below:
>>>
>>>   CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
>>>
>>> <name> is as literal, put a unique identifier.
>>> <class> is workload type to be offered by this custom-plan provider.
>>> "scan" is the only option right now, that means base relation scan.
>>> <function_name> is also as literal; it shall perform custom-plan provider.
>>>
>>> A custom-plan provider function is assumed to take an argument of
>>> "internal" type to deliver a set of planner information that is needed to
>>> construct custom-plan pathnode.
>>> In case of "scan" class, pointer towards an customScanArg object
>>> shall be delivered on invocation of custom-plan provider.
>>>
>>>     typedef struct {
>>>         uint32            custom_class;
>>>         PlannerInfo    *root;
>>>         RelOptInfo     *baserel;
>>>         RangeTblEntry  *rte;
>>>     } customScanArg;
>>>
>>> In case when the custom-plan provider function being invoked thought
>>> it can offer an alternative scan path on the relation of "baserel", things
>>> to do is (1) construct a CustomPath (or its inherited data type) object
>>> with a table of callback function pointers (2) put its own cost estimation,
>>> and (3) call add_path() to register this path as an alternative one.
>>>
>>> Once the custom-path was chosen by query planner, its CreateCustomPlan
>>> callback is called to populate CustomPlan node based on the pathnode.
>>> It also has a table of callback function pointers to handle various planner's
>>> job in setrefs.c and so on.
>>>
>>> Similarly, its CreateCustomPlanState callback is called to populate
>>> CustomPlanState node based on the plannode. It also has a table of
>>> callback function pointers to handle various executor's job during quey
>>> execution.
>>>
>>> Most of callback designs are not changed from the prior proposition in
>>> v9.4 development cycle, however, here is a few changes.
>>>
>>> * CustomPlan became to inherit Scan, and CustomPlanState became to
>>>   inherit ScanState. Because some useful routines to implement scan-
>>>   logic, like ExecScan, expects state-node has ScanState as its base
>>>   type, it's more kindness for extension side. (I'd like to avoid each
>>>   extension reinvent ExecScan by copy & paste!)
>>>   I'm not sure whether it should be a union of Join in the future, however,
>>>   it is a reasonable choice to have compatible layout with Scan/ScanState
>>>   to implement alternative "scan" logic.
>>>
>>> * Exporting static functions - I still don't have a graceful answer here.
>>>   However, it is quite natural that extensions to follow up interface updates
>>>   on the future version up of PostgreSQL.
>>>   Probably, it shall become clear what class of functions shall be
>>>   exported and what class of functions shall be re-implemented within
>>>   extension side in the later discussion.
>>>   Right now, I exported minimum ones that are needed to implement
>>>   alternative scan method - contrib/ctidscan module.
>>>
>>> Items to be discussed later:
>>> * planner integration for relations join - probably, we may define new
>>>   custom-plan classes as alternative of hash-join, merge-join and
>>>   nest-loop. If core can know this custom-plan is alternative of hash-
>>>   join, we can utilize core code to check legality of join.
>>> * generic key-value style options in custom-plan definition - Hanada
>>>   san proposed me off-list - like foreign data wrapper. It may enable
>>>   to configure multiple behavior on a binary.
>>> * ownership and access control of custom-plan. right now, only
>>>   superuser can create/drop custom-plan provider definition, thus,
>>>   it has no explicit ownership and access control. It seems to me
>>>   a reasonable assumption, however, we may have a usecase that
>>>   needs custom-plan by unprivileged users.
>>>
>>> Thanks,
>>>
>>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>>>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>>
>>>>> >> We're past the prototyping stage and into productionising what we
>>>>> >> know works, AFAIK. If that point is not clear, then we need to
>>>>> >> discuss that first.
>>>>> >
>>>>> > OK, I'll bite: what here do we know works?  Not a damn thing AFAICS;
>>>>> > it's all speculation that certain hooks might be useful, and
>>>>> > speculation that's not supported by a lot of evidence.  If you think
>>>>> > this isn't prototyping, I wonder what you think *is* prototyping.
>>>>>
>>>>> My research contacts advise me of this recent work
>>>>>   http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
>>>>> and also that they expect a prototype to be ready by October, which I have
>>>>> been told will be open source.
>>>>>
>>>>> So there are at least two groups looking at this as a serious option for
>>>>> Postgres (not including the above paper's authors).
>>>>>
>>>>> That isn't *now*, but it is at least a time scale that fits with acting
>>>>> on this in the next release, if we can separate out the various ideas and
>>>>> agree we wish to proceed.
>>>>>
>>>>> I'll submerge again...
>>>>>
>>>> Through the discussion last week, our minimum consensus are:
>>>> 1. Deregulated enhancement of FDW is not a way to go
>>>> 2. Custom-path that can replace built-in scan makes sense as a first step
>>>>    towards the future enhancement. Its planner integration is enough simple
>>>>    to do.
>>>> 3. Custom-path that can replace built-in join takes investigation how to
>>>>    integrate existing planner structure, to avoid (3a) reinvention of
>>>>    whole of join handling in extension side, and (3b) unnecessary extension
>>>>    calls towards the case obviously unsupported.
>>>>
>>>> So, I'd like to start the (2) portion towards the upcoming 1st commit-fest
>>>> on the v9.5 development cycle. Also, we will be able to have discussion
>>>> for the (3) portion concurrently, probably, towards 2nd commit-fest.
>>>>
>>>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please share
>>>> us the face-to-face discussion here.
>>>>
>>>> Thanks,
>>>> --
>>>> NEC OSS Promotion Center / PG-Strom Project
>>>> KaiGai Kohei <kaigai@ak.jp.nec.com>
>>>>
>>> --
>>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>>
>>
>>
>> --
>> Shigeru HANADA
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

-- 
Shigeru HANADA

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

04 July 2014, 04:23:43

Hanada-san,

Thanks for your dedicated reviewing.

It's a very long message. So, let me summarize the things
I shall do in the next patch.

* fix bug: custom-plan class comparison
* fix up naming convention and syntax CREATE CUSTOM PLAN PROVIDER, rather than CREATE CUSTOM PLAN. Prefix shall be
"cpp_".
* fix up: definition of get_custom_plan_oid()
* fix up: unexpected white spaces, to be tabs.
* fix up: remove unnecessary forward declarations.
* fix up: revert replace_nestloop_params() to static
* make SetCustomPlanRef an optional callback
* fix up: typos in various points
* add documentation to explain custom-plan interface.

Also, I want committer's opinion about the issues below
* whether set_cheapest() is called for all relkind?
* how argument of add_path handler shall be derivered?

Individual comments are put below:

> Kaigai-san,
> 
> Sorry for lagged response.
> 
> Here are my  random thoughts about the patch.  I couldn't understand the
> patch fully, because some of APIs are not used by ctidscan.  If
> 
> Custom Scan patch v2 review
> 
> * Custom plan class comparison
> In backend/optimizer/util/pathnode.c, custclass is compared by bit-and
> with 's'.  Do you plan to use custclass as bit field?  If so, values for
> custom plan class should not be a character.  Otherwise, custclass should
> be compared by == operator.
> 
Sorry, it is a bug that come from the previous design.
I had an idea that allows a custom plan provider to support multiple kind
of exec nodes, however, I concluded it does not make sense so much. (we
can define multiple CPP for each)

> * Purpose of GetSpecialCustomVar()
> The reason why FinalizeCustomPlan callback is necessary is not clear to
> me.
> Could you show a case that the API would be useful?
> 
It is needed feature to replace a built-in join by custom scan, however,
it might be unclear on the scan workloads.

Let me explain why join replacement needed. A join node has two input
slot (inner and outer), its expression node including Var node reference
either of slot according to its varno (INNER_VAR or OUTER_VAR).
In case when a CPP replaced a join, it has to generate an equivalent result
but it may not be a best choice to use two input streams.
(Please remind when we construct remote join on postgres_fdw, all the
materialization was done on remote side, thus we had one input stream to
generate local join equivalent view.)
On the other hands, EXPLAIN command has to understand what column is the
source of varnodes in targetlist of custom-node even if it is rewritten
to use just one slot. For example, which label shall be shown in case when
3rd item of targetlist is originally come from 2nd item of inner slot but
all the materialized result is stored into outer slot.
Only CPP can track its relationship between the original and the newer one.
This interface provides a way to solve a varnode that actually references.

> * Purpose of FinalizeCustomPlan()
> The reason why FinalizeCustomPlan callback is necessary is not clear to
> me, because ctidscan just calls finalize_primnode() and
> bms_add_members() with given information.  Could you show a case that the
> API would be useful?
> 
The main purpose of this callback gives an extension chance to apply
finalize_primenode() if custom-node hold expression tree on its private
fields.
In case when CPP picked up a part of clauses to run its own way, it shall
be attached on neither plan->targetlist nor plan->qual, only CPP knows
where does it attached. So, these orphan expression nodes have to be
treated by CPP.

> * Is it ok to call set_cheapest() for all relkind?
> Now set_cheapest() is called not for only relation and foreign table but
> also custom plan, and other relations such as subquery, function, and value.
> Calling call_custom_scan_provider() and set_cheapest() in the case of
> RTE_RELATION seems similar to the old construct, how do you think about
> this?
> 
I don't think we may be actually able to have some useful custom scan logic
on these special relation forms, however, I also didn't have a special reason
why custom-plan does not need to support these special relations.
I'd like to see committer's opinion here.


> * Is it hard to get rid of CopyCustomPlan()?
> Copying ForeignScan node doesn't need per-FDW copy function by limiting
> fdw_private to have only copy-able objects.  Can't we use the same way for
> CustomPlan?  Letting authors call NodeSetTag or
> copyObject() sounds uncomfortable to me.
> 
> This would be able to apply to TextOutCustomPlan() and TextOutCustomPath()
> too.
> 
FDW-like design was the original one, but the latest design was suggestion
by Tom on the v9.4 development cycle, because some data types are not
complianced to copyObject; like Bitmapset.

> * MultiExec support is appropriate for the first version?
> The cases need MultiExec seems little complex for the first version of custom
> scan.  What kind of plan do you image for this feature?
> 
It is definitely necessary to exchange multiple rows with custom-format with
upper level if both of nodes are managed by same CPP.
I plan to use this interface for bulk-loading that makes much faster data
loading to GPUs.

> * Does SupportBackwardScan() have enough information?
> Other scans check target list with TargetListSupportsBackwardScan().
> Isn't it necessary to check it for CustomPlan too in
> ExecSupportsBackwardScan()?
> 
It derivers CustomPlan node itself that includes Plan node.
If CPP thought it is necessary, it can run equivalent checks here.

> * Place to call custom plan provider
> Is it necessary to call provider when relkind != RELKIND_RELATION?  If yes,
> isn't it necessary to call for append relation?
> 
> I know that we concentrate to replacing scan in the initial version, so
> it would not be a serious problem, but it would be good to consider extensible
> design.
> 
Regarding of the child relation scan, set_append_rel_pathlist() calls
set_rel_pathlist() that is entry point of custom-scan paths.
If you mention about alternative-path of Append node, yes, it is not
a feature being supported in the first commit.

> * Custom Plan Provider is "addpath"?
> Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER seems
> little odd.
> Using handler like FDW makes the design too complex and/or messy?
> 
This design allows to pass a set of information needed according to the
workload; like join not only scan. If we need to extend customXXXXArg in
the future, all we need to extend is data structure definition, not
function prototype itself.
Anyway, I'd like to make a decision for this on committer review stage.

> * superclass of CustomPlanState
> CustomPlanState derives ScanState, instead of deriving PlanState directly.
> I worry the case of non-heap-scan custom plan, but it might be ok to postpone
> consideration about that at the first cut.
> 
We have some useful routines to implement custom-scan logic, but they takes
ScanState argument, like ExecScan().
Even though we can copy it and paste to extension code, it is not a good manner.
It takes three pointer variables in addition to PlanState. If CPP does not
take care about regular heap scan, keep them unused. It is quite helpful if
CPP implements some original logic on top of existing heap scan.

> * Naming and granularity of objects related to custom plan I'm not sure
> the current naming is appropriate, especially difference between "custom
> plan" and "provider" and "handler".  In the context of CREATE CUSTOM PLAN
> statement, what the term "custom plan" means?  My impression is that "custom
> plan" is an alternative plan type, e.g.
> ctidscan or pg_strom_scan.  Then what the term "provider" means?  My
> impression for that is extension, such as ctidscan and pg_strom.  The
> grammar allows users to pass function via PROVIDER clause of CREATE CUSTOM
> SCAN, so the function would be the provider of the custom plan created by
> the statement.
> 
Hmm... What you want to say is, CREATE X statement is expected to create X.
On the other hand, "custom-plan" is actually created by custom-plan provider,
not this DDL statement. The DDL statement defined custom-plan "provider".
I also think the suggestion is reasonable.

How about the statement below instead?
 CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function; cpp_kind := SCAN (other types shall be
supportedlater)
 

> * enable_customscan
> GUC parameter enable_customscan would be useful for users who want to
> disable custom plan feature temporarily.  In the case of pg_strom, using
> GPU for limited sessions for analytic or batch applications seems handy.
> 
It should be done by extension side individually.
Please imagine a user who install custom-GPU-scan, custom-matview-redirect
and custom-cache-only-scan. Purpose of each CPP are quite individually,
so I don't think enable_customscan makes sense.

> * Adding pg_custom_plan catalog
> Using "cust" as prefix for pg_custom_plan causes ambiguousness which makes
> it difficult to choose catalog prefix for a feature named "Custom Foo" in
> future.  How about using "cusp" (CUStom Plan)?
> 
> Or is it better to use pg_custom_plan_provider as catalog relation name,
> as the document says that "CREATE CUSTOM PLAN defines custom plan provider".
> Then prefix could be "cpp" (Custom Plan Provider).
> This seems to match the wording used for pg_foreign_data_wrapper.
> 
My preference "cpp" as a shorten of custom plan provider.


> * CREATE CUSTOM PLAN statement
> This is just a question:  We need to emit CREATE CUSTOM PLAN if we want
> to use I wonder how it is extended when supporting join as custom class.
> 
In case of join, I'll extend the syntax as follows:
 CREATE CUSTOM PLAN cppname   FOR [HASH JOIN|MERGE JOIN|NEST LOOP]   PROVIDER provider_func;

Like customScanArg, we will define an argument type for each join methods
then provider_func shall be called with this argument.
I think it is well flexible and extendable approach.

> * New operators about TID comparison
> IMO this portion should be a separated patch, because it adds OID definition
> of existing operators such as tidgt and tidle.  Is there any (explicit or
> implicit) rule about defining macro for oid of an operator?
> 
I don't know the general rules to define static OID definition.
Probably, these are added on demand.

> * Prototype of get_custom_plan_oid()
> custname (or cppname if use the rule I proposed above) seems appropriate
> as the parameter name of get_custom_plan_oid() because similar funcitons
> use catalog column names in such case.
> 
I'll rename it as follows:
 extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool missing_ok);


> * Coding conventions
> Some lines are indented with white space.  Future pgindent run will fix
> this issue?
> 
It's my oversight, to be fixed.

> * Unnecessary struct forward declaration Forward declarations of
> CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h seem
> unncecessary.  Other headers might have same issue.
> 
I'll check it. I had try & error during the development. It might leave
a dead code here.

> * Unnecessary externing of replace_nestloop_params()
> replace_nestloop_params() is extern-ed but it's never called outside
> createplan.c.
> 
Indeed, it's not needed until we support custom join logic.

> * Externing fix_scan_expr()
> If it's necessary for all custom plan providers to call fix_scan_expr (via
> fix_scan_list macro), isn't it able to do it in set_plan_refs() before
> calling SetCustomPlanRef()?
> 
One alternative idea is: if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node has no private expression
treeto be fixed up, CPP can have no SetCustomPlanRef callback. In this case, core backend applies fix_scan_list on the
targetlistand qual, then adjust scanrelid.
 

It was what I did in the previous revision, that was concerned by Tom
because it assumes too much things to the custom-node. (It is useful
to only custom "scan" node)

> * What does T_CustomPlanMarkPos  mean?
> It's not clear to me when CustomPlanMarkPos works.  Is it for a custom plan
> provider which supports marking position and rewind to the position, and
> ctidscan just lacks capability to do that, so it is not used anywhere?
> 
Its previous design had a flag whether it allows backward scan, in the body
of CustomPlan structure. However, it makes a problem on ExecSupportsMarkRestore()
that takes only node-tag to determine whether the supplied node support
backward scan or not.
Once I tried to change ExecSupportsMarkRestore() to accept node body, then
Tom suggested to use a separated node tag instead.


> * Unnecessary changes in allpaths.c
> some comment about Subquery and CTE are changed (perhaps) accidentally.
> 
No, it is intentional because set_cheapest() was consolidated.

> * Typos
>   * planenr -> planner, implements -> implement in create_custom_plan.sgml
>   * CustomScan in nodeCustom.h should be CustomPlan?
>   * delivered -> derived, in src/backend/optimizer/util/pathnode.c
> 
OK, I'll fix them.

> * Document "Writing Custom Plan Provider" is not provided Custom Plan
> Provider author would (and I DO!) hope documents about writing a custom
> plan provider.
> 
A documentation like fdwhandler.sgml, isn't it?
OK, I'll make it up.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> 2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> > Hanada-san,
> >
> > Thanks for your checks. I oversight the points when I submit the patch,
> sorry.
> > The attached one is revised one on documentation stuff and
> contrib/Makefile.
> >
> > Thanks,
> >
> > 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>:
> >> Kaigai-san,
> >>
> >> I've just applied v1 patch, and tried build and install, but I found
> two issues:
> >>
> >> 1) The contrib/ctidscan is not automatically built/installed because
> >> it's not described in contrib/Makefile.  Is this expected behavior?
> >> 2) I got an error message below when building document.
> >>
> >> $ cd doc/src/sgml
> >> $ make
> >> openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
> >> -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
> >> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
> >> "SQL-CREATECUSTOMPLAN"
> >> make: *** [HTML.index] Error 1
> >> make: *** Deleting file `HTML.index'
> >>
> >> I'll review another part of the patch, including the design.
> >>
> >>
> >> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> >>> According to the discussion upthread, I revised the custom-plan
> >>> patch to focus on regular relation scan but no join support right
> >>> now, and to support DDL command to define custom-plan providers.
> >>>
> >>> Planner integration with custom logic to scan a particular relation
> >>> is enough simple, unlike various join cases. It's almost similar to
> >>> what built-in logic are doing now - custom-plan provider adds a path
> >>> node with its cost estimation if it can offer alternative way to
> >>> scan referenced relation. (in case of no idea, it does not need to
> >>> add any paths)
> >>>
> >>> A new DDL syntax I'd like to propose is below:
> >>>
> >>>   CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>;
> >>>
> >>> <name> is as literal, put a unique identifier.
> >>> <class> is workload type to be offered by this custom-plan provider.
> >>> "scan" is the only option right now, that means base relation scan.
> >>> <function_name> is also as literal; it shall perform custom-plan
> provider.
> >>>
> >>> A custom-plan provider function is assumed to take an argument of
> >>> "internal" type to deliver a set of planner information that is
> >>> needed to construct custom-plan pathnode.
> >>> In case of "scan" class, pointer towards an customScanArg object
> >>> shall be delivered on invocation of custom-plan provider.
> >>>
> >>>     typedef struct {
> >>>         uint32            custom_class;
> >>>         PlannerInfo    *root;
> >>>         RelOptInfo     *baserel;
> >>>         RangeTblEntry  *rte;
> >>>     } customScanArg;
> >>>
> >>> In case when the custom-plan provider function being invoked thought
> >>> it can offer an alternative scan path on the relation of "baserel",
> >>> things to do is (1) construct a CustomPath (or its inherited data
> >>> type) object with a table of callback function pointers (2) put its
> >>> own cost estimation, and (3) call add_path() to register this path as
> an alternative one.
> >>>
> >>> Once the custom-path was chosen by query planner, its
> >>> CreateCustomPlan callback is called to populate CustomPlan node based
> on the pathnode.
> >>> It also has a table of callback function pointers to handle various
> >>> planner's job in setrefs.c and so on.
> >>>
> >>> Similarly, its CreateCustomPlanState callback is called to populate
> >>> CustomPlanState node based on the plannode. It also has a table of
> >>> callback function pointers to handle various executor's job during
> >>> quey execution.
> >>>
> >>> Most of callback designs are not changed from the prior proposition
> >>> in
> >>> v9.4 development cycle, however, here is a few changes.
> >>>
> >>> * CustomPlan became to inherit Scan, and CustomPlanState became to
> >>>   inherit ScanState. Because some useful routines to implement scan-
> >>>   logic, like ExecScan, expects state-node has ScanState as its base
> >>>   type, it's more kindness for extension side. (I'd like to avoid each
> >>>   extension reinvent ExecScan by copy & paste!)
> >>>   I'm not sure whether it should be a union of Join in the future,
> however,
> >>>   it is a reasonable choice to have compatible layout with
> Scan/ScanState
> >>>   to implement alternative "scan" logic.
> >>>
> >>> * Exporting static functions - I still don't have a graceful answer
> here.
> >>>   However, it is quite natural that extensions to follow up interface
> updates
> >>>   on the future version up of PostgreSQL.
> >>>   Probably, it shall become clear what class of functions shall be
> >>>   exported and what class of functions shall be re-implemented within
> >>>   extension side in the later discussion.
> >>>   Right now, I exported minimum ones that are needed to implement
> >>>   alternative scan method - contrib/ctidscan module.
> >>>
> >>> Items to be discussed later:
> >>> * planner integration for relations join - probably, we may define new
> >>>   custom-plan classes as alternative of hash-join, merge-join and
> >>>   nest-loop. If core can know this custom-plan is alternative of hash-
> >>>   join, we can utilize core code to check legality of join.
> >>> * generic key-value style options in custom-plan definition - Hanada
> >>>   san proposed me off-list - like foreign data wrapper. It may enable
> >>>   to configure multiple behavior on a binary.
> >>> * ownership and access control of custom-plan. right now, only
> >>>   superuser can create/drop custom-plan provider definition, thus,
> >>>   it has no explicit ownership and access control. It seems to me
> >>>   a reasonable assumption, however, we may have a usecase that
> >>>   needs custom-plan by unprivileged users.
> >>>
> >>> Thanks,
> >>>
> >>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> >>>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>>>>
> >>>>> >> We're past the prototyping stage and into productionising what
> >>>>> >> we know works, AFAIK. If that point is not clear, then we need
> >>>>> >> to discuss that first.
> >>>>> >
> >>>>> > OK, I'll bite: what here do we know works?  Not a damn thing
> >>>>> > AFAICS; it's all speculation that certain hooks might be useful,
> >>>>> > and speculation that's not supported by a lot of evidence.  If
> >>>>> > you think this isn't prototyping, I wonder what you think *is*
> prototyping.
> >>>>>
> >>>>> My research contacts advise me of this recent work
> >>>>>   http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf
> >>>>> and also that they expect a prototype to be ready by October,
> >>>>> which I have been told will be open source.
> >>>>>
> >>>>> So there are at least two groups looking at this as a serious
> >>>>> option for Postgres (not including the above paper's authors).
> >>>>>
> >>>>> That isn't *now*, but it is at least a time scale that fits with
> >>>>> acting on this in the next release, if we can separate out the
> >>>>> various ideas and agree we wish to proceed.
> >>>>>
> >>>>> I'll submerge again...
> >>>>>
> >>>> Through the discussion last week, our minimum consensus are:
> >>>> 1. Deregulated enhancement of FDW is not a way to go 2. Custom-path
> >>>> that can replace built-in scan makes sense as a first step
> >>>>    towards the future enhancement. Its planner integration is enough
> simple
> >>>>    to do.
> >>>> 3. Custom-path that can replace built-in join takes investigation how
> to
> >>>>    integrate existing planner structure, to avoid (3a) reinvention
> of
> >>>>    whole of join handling in extension side, and (3b) unnecessary
> extension
> >>>>    calls towards the case obviously unsupported.
> >>>>
> >>>> So, I'd like to start the (2) portion towards the upcoming 1st
> >>>> commit-fest on the v9.5 development cycle. Also, we will be able to
> >>>> have discussion for the (3) portion concurrently, probably, towards
> 2nd commit-fest.
> >>>>
> >>>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please
> >>>> share us the face-to-face discussion here.
> >>>>
> >>>> Thanks,
> >>>> --
> >>>> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> >>>> <kaigai@ak.jp.nec.com>
> >>>>
> >>> --
> >>> KaiGai Kohei <kaigai@kaigai.gr.jp>
> >>
> >>
> >>
> >> --
> >> Shigeru HANADA
> >
> >
> >
> > --
> > KaiGai Kohei <kaigai@kaigai.gr.jp>
> 
> 
> 
> --
> Shigeru HANADA

Re: [v9.5] Custom Plan API

From

Shigeru Hanada

Date:

15 July 2014, 11:09:10

Kaigai-san,

2014-07-14 22:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Hanada-san,
>
> Thanks for your checking. The attached v4 patch is rebased one on the
> latest master branch. Indeed, merge conflict was trivial.
>
> Updates from the v3 are below:
> - custom-plan.sgml was renamed to custom-plan-provider.sgml
> - fix up the comments in pg_custom_plan_provider.h that mentioned
>   about old field name.
> - applied your patch to fix up typos. (thanks so much!)
> - put "costs off" on the EXPLAIN command in the regression test of
>   ctidscan extension.

Checked, but the patch fails sanity-check test, you need to modify
expected file of the test.



> Nothing to comment on the design and implementation from your
> viewpoint any more?

As much as I can tell, the design seems reasonable.  After fix for the
small issue above, I'll move the patch status to "Ready for
committer".

-- 
Shigeru HANADA

Re: [v9.5] Custom Plan API

From

Shigeru Hanada

Date:

16 July 2014, 01:43:18

Kaigai-san,

2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> Sorry, expected result of sanity-check test was not updated on
> renaming to pg_custom_plan_provider.
> The attached patch fixed up this point.

I confirmed that all regression tests passed, so I marked the patch as
"Ready for committer".

-- 
Shigeru HANADA

Re: [v9.5] Custom Plan API

From

Andres Freund

Date:

17 July 2014, 18:12:05

On 2014-07-16 10:43:08 +0900, Shigeru Hanada wrote:
> Kaigai-san,
> 
> 2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> > Sorry, expected result of sanity-check test was not updated on
> > renaming to pg_custom_plan_provider.
> > The attached patch fixed up this point.
> 
> I confirmed that all regression tests passed, so I marked the patch as
> "Ready for committer".

I personally don't see how this patch is 'ready for committer'. I
realize that that state is sometimes used to denote that review needs to
be "escalated", but it still seemspremature.

Unless I miss something there hasn't been any API level review of this?
Also, aren't there several open items?

Perhaps there needs to be a stage between 'needs review' and 'ready for
committer'?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: [v9.5] Custom Plan API

From

Alvaro Herrera

Date:

17 July 2014, 18:59:56

I haven't followed this at all, but I just skimmed over it and noticed
the CustomPlanMarkPos thingy; apologies if this has been discussed
before.  It seems a bit odd to me; why isn't it sufficient to have a
boolean flag in regular CustomPlan to indicate that it supports
mark/restore?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

17 July 2014, 19:38:29

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I haven't followed this at all, but I just skimmed over it and noticed
> the CustomPlanMarkPos thingy; apologies if this has been discussed
> before.  It seems a bit odd to me; why isn't it sufficient to have a
> boolean flag in regular CustomPlan to indicate that it supports
> mark/restore?

Yeah, I thought that was pretty bogus too, but it's well down the
list of issues that were there last time I looked at this ...
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

18 July 2014, 01:29:33

> I personally don't see how this patch is 'ready for committer'. I realize
> that that state is sometimes used to denote that review needs to be
> "escalated", but it still seemspremature.
>
> Unless I miss something there hasn't been any API level review of this?
> Also, aren't there several open items?
>
Even though some interface specifications are revised according to the
comment from Tom on the last development cycle, the current set of
interfaces are not reviewed by committers. I really want this.

Here are two open items that we want to wait for committers comments.

* Whether set_cheapest() is called for all relkind?

This pactch moved set_cheapest() to the end of set_rel_pathlist(),
to consolidate entrypoint of custom-plan-provider handler function.
It also implies CPP can provider alternative paths towards non-regular
relations (like sub-queries, functions, ...).
Hanada-san wonder whether we really have a case to run alternative
sub-query code. Even though I don't have usecases for alternative
sub-query execution logic, but we also don't have a reason why not
to restrict it.

* How argument of add_path handler shall be derivered?

The handler function (that adds custom-path to the required relation
scan if it can provide) is declared with an argument with INTERNAL
data type. Extension needs to have type-cast on the supplied pointer
to customScanArg data-type (or potentially customHashJoinArg and
so on...) according to the custom plan class.
I think it is well extendable design than strict argument definitions,
but Hanada-san wonder whether it is the best design.

> Perhaps there needs to be a stage between 'needs review' and 'ready for
> committer'?
>
It needs clarification of 'ready for committer'. I think interface
specification is a kind of task to be discussed with committers
because preference/viewpoint of rr-reviewer are not always same
opinion with them.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Andres Freund [mailto:andres@2ndquadrant.com]
> Sent: Friday, July 18, 2014 3:12 AM
> To: Shigeru Hanada
> Cc: Kaigai Kouhei(海外 浩平); Kohei KaiGai; Simon Riggs; Tom Lane; Stephen
> Frost; Robert Haas; PgHacker; Jim Mlodgenski; Peter Eisentraut
> Subject: Re: [HACKERS] [v9.5] Custom Plan API
>
> On 2014-07-16 10:43:08 +0900, Shigeru Hanada wrote:
> > Kaigai-san,
> >
> > 2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> > > Sorry, expected result of sanity-check test was not updated on
> > > renaming to pg_custom_plan_provider.
> > > The attached patch fixed up this point.
> >
> > I confirmed that all regression tests passed, so I marked the patch as
> > "Ready for committer".
>
> I personally don't see how this patch is 'ready for committer'. I realize
> that that state is sometimes used to denote that review needs to be
> "escalated", but it still seemspremature.
>
> Unless I miss something there hasn't been any API level review of this?
> Also, aren't there several open items?
>
> Perhaps there needs to be a stage between 'needs review' and 'ready for
> committer'?
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                       http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

18 July 2014, 01:29:34

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > I haven't followed this at all, but I just skimmed over it and noticed
> > the CustomPlanMarkPos thingy; apologies if this has been discussed
> > before.  It seems a bit odd to me; why isn't it sufficient to have a
> > boolean flag in regular CustomPlan to indicate that it supports
> > mark/restore?
>
> Yeah, I thought that was pretty bogus too, but it's well down the list of
> issues that were there last time I looked at this ...
>
IIRC, CustomPlanMarkPos was suggested to keep the interface of
ExecSupportsMarkRestore() that takes plannode tag to determine
whether it support Mark/Restore.
As my original proposition did, it seems to me a flag field in
CustomPlan structure is straightforward, if we don't hesitate to
change ExecSupportsMarkRestore().

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

22 August 2014, 15:40:01

On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> I haven't followed this at all, but I just skimmed over it and noticed
>> the CustomPlanMarkPos thingy; apologies if this has been discussed
>> before.  It seems a bit odd to me; why isn't it sufficient to have a
>> boolean flag in regular CustomPlan to indicate that it supports
>> mark/restore?
>
> Yeah, I thought that was pretty bogus too, but it's well down the
> list of issues that were there last time I looked at this ...

I think the threshold question for this incarnation of the patch is
whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
a way of installing new plan providers into the database.  If we are,
then I can go ahead and enumerate a long list of things that will need
to be fixed to make that code acceptable (such as adding pg_dump
support).  But if we're not, there's no point in spending any time on
that part of the patch.

I can see a couple of good reasons to think that this approach might
be reasonable:

- In some ways, a custom plan provider (really, at this point, a
custom scan provider) is very similar to a foreign data wrapper.  To
the guts of PostgreSQL, an FDW is a sort of black box that knows how
to scan some data not managed by PostgreSQL.  A custom plan provider
is similar, except that it scans data that *is* managed by PostgreSQL.

- There's also some passing resemblance between a custom plan provider
and an access method.  Access methods provide a set of APIs for fast
access to data via an index, while custom plan providers provide an
API for fast access to data via magic that the core system doesn't
(and need not) understand.  While access methods don't have associated
SQL syntax, they do include catalog structure, so perhaps this should
too, by analogy.

All that having been said, I'm having a hard time mustering up
enthusiasm for this way of doing things.  As currently constituted,
the pg_custom_plan_provider catalog contains only a name, a "class"
that is always 's' for scan, and a handler function OID.  Quite
frankly, that's a whole lot of nothing.  If we got rid of the
pg_catalog structure and just had something like
RegisterCustomPlanProvider(char *name, void (*)(customScanArg *),
which could be invoked from _PG_init(), hundreds and hundreds of lines
of code could go away and we wouldn't lose any actual functionality;
you'd just list your custom plan providers in shared_preload_libraries
or local_preload_libraries instead of listing them in a system
catalog.  In fact, you might even have more functionality, because you
could load providers into particular sessions rather than system-wide,
which isn't possible with this design.

I think the underlying issue here really has to do with when custom
plan providers get invoked - what triggers that?  For foreign data
wrappers, we have some relations that are plain tables (relkind = 'r')
and no foreign data wrapper code is invoked.  We have others that are
flagged as foreign tables (relkind = 'f') and for those we look up the
matching FDW (via ftserver) and run the code.  Similarly, for an index
AM, we notice that the relation is an index (relkind = 'r') and then
consult relam to figure out which index AM we should invoke.  But as
KaiGai is conceiving this feature, it's quite different.  Rather than
applying only to particular relations, and being mutually exclusive
with other options that might apply to those relations, it applies to
*everything* in the database in addition to whatever other options may
be present.  The included ctidscan implementation is just as good an
example as PG-Strom: you inspect the query and see, based on the
operators present, whether there's any hope of accelerating things.
In other words, there's no user configuration - and also, not
irrelevantly, no persistent on-disk state the way you have for an
index, or even an FDW, which has on disk state to the extent that
there have to be catalog entries tying a particular FDW to a
particular table.

A lot of the previous discussion of this topic revolves around the
question of whether we can unify the use case that this patch is
targeting with other things - e.g. Citus's desire to store its data
files within the data directory while retaining control over data
access, thus not a perfect fit for FDWs; the desire to push joins down
to foreign servers; more generally, the desire to replace a join with
a custom plan that may or may not use access paths for the underlying
relations as subpaths.  I confess I'm not seeing a whole lot of
commonality with anything other than the custom-join-path idea, which
probably shares many of what I believe to be the relevant
characteristics of a custom scan as conceived by KaiGai: namely, that
all of the decisions about whether to inject a custom path in
particular circumstances are left up to the provider itself based on
inspection of the specific query, rather than being a result of user
configuration.

So, I'm tentatively in favor of stripping the DDL support out of this
patch and otherwise trying to boil it down to something that's really
minimal, but I'd like to hear what others think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

22 August 2014, 16:10:16

Robert Haas <robertmhaas@gmail.com> writes:
> I think the threshold question for this incarnation of the patch is
> whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
> a way of installing new plan providers into the database.

I tend to agree with your conclusion that that's a whole lot of
infrastructure with very little return.  I don't see anything here
we shouldn't do via function hooks instead, and/or a "register" callback
from a dynamically loaded library.

Also, we tend to think (for good reason) that once something is embedded
at the SQL level it's frozen; we are much more willing to redesign C-level
APIs.  There is no possible way that it's a good idea for this stuff to
get frozen in its first iteration.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Kohei KaiGai

Date:

23 August 2014, 01:48:28

2014-08-23 0:39 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> I haven't followed this at all, but I just skimmed over it and noticed
>>> the CustomPlanMarkPos thingy; apologies if this has been discussed
>>> before.  It seems a bit odd to me; why isn't it sufficient to have a
>>> boolean flag in regular CustomPlan to indicate that it supports
>>> mark/restore?
>>
>> Yeah, I thought that was pretty bogus too, but it's well down the
>> list of issues that were there last time I looked at this ...
>
> I think the threshold question for this incarnation of the patch is
> whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as
> a way of installing new plan providers into the database.  If we are,
> then I can go ahead and enumerate a long list of things that will need
> to be fixed to make that code acceptable (such as adding pg_dump
> support).  But if we're not, there's no point in spending any time on
> that part of the patch.
>
Even though I'm patch author, I'd like to agree with this approach.
In fact, the previous custom-plan interface I proposed to the v9.4
development cycle does not include DDL support to register the
custom-plan providers, however, it works fine.

One thing I was pointed out, it is the reason why I implemented
DDL support, is that intermediation of c-language function also
loads the extension module implicitly. It is an advantage, but
not sure whether it shall be supported from the beginning.

> I can see a couple of good reasons to think that this approach might
> be reasonable:
>
> - In some ways, a custom plan provider (really, at this point, a
> custom scan provider) is very similar to a foreign data wrapper.  To
> the guts of PostgreSQL, an FDW is a sort of black box that knows how
> to scan some data not managed by PostgreSQL.  A custom plan provider
> is similar, except that it scans data that *is* managed by PostgreSQL.
>
> - There's also some passing resemblance between a custom plan provider
> and an access method.  Access methods provide a set of APIs for fast
> access to data via an index, while custom plan providers provide an
> API for fast access to data via magic that the core system doesn't
> (and need not) understand.  While access methods don't have associated
> SQL syntax, they do include catalog structure, so perhaps this should
> too, by analogy.
>
> All that having been said, I'm having a hard time mustering up
> enthusiasm for this way of doing things.  As currently constituted,
> the pg_custom_plan_provider catalog contains only a name, a "class"
> that is always 's' for scan, and a handler function OID.  Quite
> frankly, that's a whole lot of nothing.  If we got rid of the
> pg_catalog structure and just had something like
> RegisterCustomPlanProvider(char *name, void (*)(customScanArg *),
> which could be invoked from _PG_init(), hundreds and hundreds of lines
> of code could go away and we wouldn't lose any actual functionality;
> you'd just list your custom plan providers in shared_preload_libraries
> or local_preload_libraries instead of listing them in a system
> catalog.  In fact, you might even have more functionality, because you
> could load providers into particular sessions rather than system-wide,
> which isn't possible with this design.
>
Indeed. It's an advantage of the approach without system catalog.


> I think the underlying issue here really has to do with when custom
> plan providers get invoked - what triggers that?  For foreign data
> wrappers, we have some relations that are plain tables (relkind = 'r')
> and no foreign data wrapper code is invoked.  We have others that are
> flagged as foreign tables (relkind = 'f') and for those we look up the
> matching FDW (via ftserver) and run the code.  Similarly, for an index
> AM, we notice that the relation is an index (relkind = 'r') and then
> consult relam to figure out which index AM we should invoke.  But as
> KaiGai is conceiving this feature, it's quite different.  Rather than
> applying only to particular relations, and being mutually exclusive
> with other options that might apply to those relations, it applies to
> *everything* in the database in addition to whatever other options may
> be present.  The included ctidscan implementation is just as good an
> example as PG-Strom: you inspect the query and see, based on the
> operators present, whether there's any hope of accelerating things.
> In other words, there's no user configuration - and also, not
> irrelevantly, no persistent on-disk state the way you have for an
> index, or even an FDW, which has on disk state to the extent that
> there have to be catalog entries tying a particular FDW to a
> particular table.
>
Yes, that's my point. In case of FDW or index AM, the query planner
can have some expectations how relevant executor node will handle
the given relation scan according to the persistent state.
However, custom-plan is a black-box to the query planner, and it
cannot have expectation of how relation scan is handled, except for
the cost value estimated by custom-plan provider.
Thus, this interface is designed to invoke custom-plan providers being
registered on relation scan, to ask them whether it can offer alternative
way to scan.

Probably, it may have a shortcut to skip invocation in case when custom-
plan provider obviously cannot provide any alternative plan.
For example, we may add a flag to RegisterCustomPlanProvider() to
tell this custom-plan provider works on only relkind='r', thus no need to
invoke on other relation types.

> A lot of the previous discussion of this topic revolves around the
> question of whether we can unify the use case that this patch is
> targeting with other things - e.g. Citus's desire to store its data
> files within the data directory while retaining control over data
> access, thus not a perfect fit for FDWs; the desire to push joins down
> to foreign servers; more generally, the desire to replace a join with
> a custom plan that may or may not use access paths for the underlying
> relations as subpaths.  I confess I'm not seeing a whole lot of
> commonality with anything other than the custom-join-path idea, which
> probably shares many of what I believe to be the relevant
> characteristics of a custom scan as conceived by KaiGai: namely, that
> all of the decisions about whether to inject a custom path in
> particular circumstances are left up to the provider itself based on
> inspection of the specific query, rather than being a result of user
> configuration.
>
> So, I'm tentatively in favor of stripping the DDL support out of this
> patch and otherwise trying to boil it down to something that's really
> minimal, but I'd like to hear what others think.
>
I'd like to follow this direction, and start stripping the DDL support.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

26 August 2014, 20:09:33

On Fri, Aug 22, 2014 at 9:48 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> One thing I was pointed out, it is the reason why I implemented
> DDL support, is that intermediation of c-language function also
> loads the extension module implicitly. It is an advantage, but
> not sure whether it shall be supported from the beginning.

That is definitely an advantage of the DDL-based approach, but I think
it's too much extra machinery for not enough real advantage.  Sounds
like we all agree, so ...

> I'd like to follow this direction, and start stripping the DDL support.

...please make it so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

29 August 2014, 17:33:51

On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> > I'd like to follow this direction, and start stripping the DDL support.
>>
>> ...please make it so.
>>
> The attached patch eliminates DDL support.
>
> Instead of the new CREATE CUSTOM PLAN PROVIDER statement,
> it adds an internal function; register_custom_scan_provider
> that takes custom plan provider name and callback function
> to add alternative scan path (should have a form of CustomPath)
> during the query planner is finding out the cheapest path to
> scan the target relation.
> Also, documentation stuff is revised according to the latest
> design.
> Any other stuff keeps the previous design.

Comments:

1. There seems to be no reason for custom plan nodes to have MultiExec
support; I think this as an area where extensibility is extremely
unlikely to work out.  The MultiExec mechanism is really only viable
between closely-cooperating nodes, like Hash and HashJoin, or
BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably
those things could have been written as a single, more complex node.
Are we really going to want to support a custom plan that can
substitute for a Hash or BitmapAnd node?  I really doubt that's very
useful.

2. This patch is still sort of on the fence about whether we're
implementing custom plans (of any type) or custom scans (thus, of some
particular relation).  I previously recommended that we confine
ourselves initially to the task of adding custom *scans* and leave the
question of other kinds of custom plan nodes to a future patch.  After
studying the latest patch, I'm inclined to suggest a slightly revised
strategy.  This patch is really adding THREE kinds of custom objects:
CustomPlanState, CustomPlan, and CustomPath. CustomPlanState inherits
from ScanState, so it is not really a generic CustomPlan, but
specifically a CustomScan; likewise, CustomPlan inherits from Scan,
and is therefore a CustomScan, not a CustomPlan.  But CustomPath is
different: it's just a Path.  Even if we only have the hooks to inject
CustomPaths that are effectively scans at this point, I think that
part of the infrastructure could be somewhat generic.  Perhaps
eventually we have CustomPath which can generate either CustomScan or
CustomJoin which in turn could generate CustomScanState and
CustomJoinState.

For now, I propose that we rename CustomPlan and CustomPlanState to
CustomScan and CustomScanState, because that's what they are; but that
we leave CustomPath as-is.  For ease of review, I also suggest
splitting this into a series of three patches: (1) add support for
CustomPath; (2) add support for CustomScan and CustomScanState; (3)
ctidscan.

3. Is it really a good idea to invoke custom scan providers for RTEs
of every type?  It's pretty hard to imagine that a custom scan
provider can do anything useful with, say, RTE_VALUES.  Maybe an
accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but
even that feels like an awfully big stretch.  At least until clear use
cases emerge, I'd be inclined to restrict this to RTE_RELATION scans
where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in
set_plain_rel_pathlist() rather than set_rel_pathlist().

(We might even want to consider whether the hook in
set_plain_rel_pathlist() ought to be allowed to inject a non-custom
plan; e.g. substitute a scan of relation B for a scan of relation A.
For example, imagine that B contains all rows from A that satisfy some
predicate. This could even be useful for foreign tables; e.g.
substitute a scan of a local copy of a foreign table for a reference
to that table.  But I put all of these ideas in parentheses because
they're only good ideas to the extent that they don't sidetrack us too
much.)

4. Department of minor nitpicks.  You've got a random 'xs' in the
comments for ExecSupportsBackwardScan. And, in contrib/ctidscan,
ctidscan_path_methods, ctidscan_plan_methods, and
ctidscan_exec_methods can have static initializers; there's no need to
initialize them at run time in _PG_init().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kohei KaiGai

Date:

31 August 2014, 04:54:35

2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>> > I'd like to follow this direction, and start stripping the DDL support.
>>>
>>> ...please make it so.
>>>
>> The attached patch eliminates DDL support.
>>
>> Instead of the new CREATE CUSTOM PLAN PROVIDER statement,
>> it adds an internal function; register_custom_scan_provider
>> that takes custom plan provider name and callback function
>> to add alternative scan path (should have a form of CustomPath)
>> during the query planner is finding out the cheapest path to
>> scan the target relation.
>> Also, documentation stuff is revised according to the latest
>> design.
>> Any other stuff keeps the previous design.
>
> Comments:
>
> 1. There seems to be no reason for custom plan nodes to have MultiExec
> support; I think this as an area where extensibility is extremely
> unlikely to work out.  The MultiExec mechanism is really only viable
> between closely-cooperating nodes, like Hash and HashJoin, or
> BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably
> those things could have been written as a single, more complex node.
> Are we really going to want to support a custom plan that can
> substitute for a Hash or BitmapAnd node?  I really doubt that's very
> useful.
>
This intends to allows a particular custom-scan provider to exchange
its internal data when multiple custom-scan node is stacked.
So, it can be considered a facility to implement closely-cooperating nodes;
both of them are managed by same custom-scan provider.
An example is gpu-accelerated version of hash-join that takes underlying
custom-scan node that will returns a hash table with gpu preferable data
structure, but should not be a part of row-by-row interface.
I believe it is valuable for some use cases, even though I couldn't find
a use-case in ctidscan example.

> 2. This patch is still sort of on the fence about whether we're
> implementing custom plans (of any type) or custom scans (thus, of some
> particular relation).  I previously recommended that we confine
> ourselves initially to the task of adding custom *scans* and leave the
> question of other kinds of custom plan nodes to a future patch.  After
> studying the latest patch, I'm inclined to suggest a slightly revised
> strategy.  This patch is really adding THREE kinds of custom objects:
> CustomPlanState, CustomPlan, and CustomPath. CustomPlanState inherits
> from ScanState, so it is not really a generic CustomPlan, but
> specifically a CustomScan; likewise, CustomPlan inherits from Scan,
> and is therefore a CustomScan, not a CustomPlan.  But CustomPath is
> different: it's just a Path.  Even if we only have the hooks to inject
> CustomPaths that are effectively scans at this point, I think that
> part of the infrastructure could be somewhat generic.  Perhaps
> eventually we have CustomPath which can generate either CustomScan or
> CustomJoin which in turn could generate CustomScanState and
> CustomJoinState.
>
Suggestion seems to me reasonable. The reason why CustomPlanState
inheris ScanState and CustomPlan inherits Scan is, just convenience for
implementation of extensions. Some useful internal APIs, like ExecScan(),
takes argument of ScanState, so it was a better strategy to choose
Scan/ScanState instead of the bare Plan/PlanState.
Anyway, I'd like to follow the perspective that looks CustomScan as one
derivative from the CustomPath. It is more flexible.

> For now, I propose that we rename CustomPlan and CustomPlanState to
> CustomScan and CustomScanState, because that's what they are; but that
> we leave CustomPath as-is.  For ease of review, I also suggest
> splitting this into a series of three patches: (1) add support for
> CustomPath; (2) add support for CustomScan and CustomScanState; (3)
> ctidscan.
>
OK, I'll do that.

> 3. Is it really a good idea to invoke custom scan providers for RTEs
> of every type?  It's pretty hard to imagine that a custom scan
> provider can do anything useful with, say, RTE_VALUES.  Maybe an
> accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but
> even that feels like an awfully big stretch.  At least until clear use
> cases emerge, I'd be inclined to restrict this to RTE_RELATION scans
> where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in
> set_plain_rel_pathlist() rather than set_rel_pathlist().
>
I'd like to agree. Indeed, it's not easy to assume a use case of
custom-logic for non-plain relations.

> (We might even want to consider whether the hook in
> set_plain_rel_pathlist() ought to be allowed to inject a non-custom
> plan; e.g. substitute a scan of relation B for a scan of relation A.
> For example, imagine that B contains all rows from A that satisfy some
> predicate. This could even be useful for foreign tables; e.g.
> substitute a scan of a local copy of a foreign table for a reference
> to that table.  But I put all of these ideas in parentheses because
> they're only good ideas to the extent that they don't sidetrack us too
> much.)
>
Hmm... It seems to me we need another infrastructure to take
a substitute scan, because add_path() is called towards a certain
RelOpInfo that is associated with the relation A.
As long as custom-scan provider "internally" redirect a request for
scan of A by substitute scan B (with taking care of all other stuff
like relation locks), I don't think we need to put some other hooks
outside from the set_plain_rel_pathlist().

> 4. Department of minor nitpicks.  You've got a random 'xs' in the
> comments for ExecSupportsBackwardScan.
>
Sorry, I didn't type 'ctrl' well when I saved the source code on emacs...

> And, in contrib/ctidscan,
> ctidscan_path_methods, ctidscan_plan_methods, and
> ctidscan_exec_methods can have static initializers; there's no need to
> initialize them at run time in _PG_init().
>
It came from the discussion I had long time before during patch
reviewing of postgres_fdw. I suggested to use static table of
FdwRoutine but I got a point that says some compiler raise
error/warning to put function pointers on static initialization.
I usually use GCC only, so I'm not sure whether this argue is
right or not, even though the postgres_fdw_handler() allocates
FdwRoutine using palloc() then put function pointers for each.

Anyway, I'll start to revise the patch according to the comments
2, 3 and first half of 4. Also, I'd like to see the comments regarding
to the 1 and later half of 4.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

03 September 2014, 13:47:35

On Sun, Aug 31, 2014 at 12:54 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> 2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas@gmail.com>:
>> On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>>> > I'd like to follow this direction, and start stripping the DDL support.
>>>>
>>>> ...please make it so.
>>>>
>>> The attached patch eliminates DDL support.
>>>
>>> Instead of the new CREATE CUSTOM PLAN PROVIDER statement,
>>> it adds an internal function; register_custom_scan_provider
>>> that takes custom plan provider name and callback function
>>> to add alternative scan path (should have a form of CustomPath)
>>> during the query planner is finding out the cheapest path to
>>> scan the target relation.
>>> Also, documentation stuff is revised according to the latest
>>> design.
>>> Any other stuff keeps the previous design.
>>
>> Comments:
>>
>> 1. There seems to be no reason for custom plan nodes to have MultiExec
>> support; I think this as an area where extensibility is extremely
>> unlikely to work out.  The MultiExec mechanism is really only viable
>> between closely-cooperating nodes, like Hash and HashJoin, or
>> BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably
>> those things could have been written as a single, more complex node.
>> Are we really going to want to support a custom plan that can
>> substitute for a Hash or BitmapAnd node?  I really doubt that's very
>> useful.
>>
> This intends to allows a particular custom-scan provider to exchange
> its internal data when multiple custom-scan node is stacked.
> So, it can be considered a facility to implement closely-cooperating nodes;
> both of them are managed by same custom-scan provider.
> An example is gpu-accelerated version of hash-join that takes underlying
> custom-scan node that will returns a hash table with gpu preferable data
> structure, but should not be a part of row-by-row interface.
> I believe it is valuable for some use cases, even though I couldn't find
> a use-case in ctidscan example.

Color me skeptical.  Please remove that part for now, and we can
revisit it when, and if, a plausible use case emerges.

>> 3. Is it really a good idea to invoke custom scan providers for RTEs
>> of every type?  It's pretty hard to imagine that a custom scan
>> provider can do anything useful with, say, RTE_VALUES.  Maybe an
>> accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but
>> even that feels like an awfully big stretch.  At least until clear use
>> cases emerge, I'd be inclined to restrict this to RTE_RELATION scans
>> where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in
>> set_plain_rel_pathlist() rather than set_rel_pathlist().
>>
> I'd like to agree. Indeed, it's not easy to assume a use case of
> custom-logic for non-plain relations.
>
>> (We might even want to consider whether the hook in
>> set_plain_rel_pathlist() ought to be allowed to inject a non-custom
>> plan; e.g. substitute a scan of relation B for a scan of relation A.
>> For example, imagine that B contains all rows from A that satisfy some
>> predicate. This could even be useful for foreign tables; e.g.
>> substitute a scan of a local copy of a foreign table for a reference
>> to that table.  But I put all of these ideas in parentheses because
>> they're only good ideas to the extent that they don't sidetrack us too
>> much.)
>>
> Hmm... It seems to me we need another infrastructure to take
> a substitute scan, because add_path() is called towards a certain
> RelOpInfo that is associated with the relation A.
> As long as custom-scan provider "internally" redirect a request for
> scan of A by substitute scan B (with taking care of all other stuff
> like relation locks), I don't think we need to put some other hooks
> outside from the set_plain_rel_pathlist().

OK, I see.  So this would have to be implemented as some new kind of
path anyway.  It might be worth allowing custom paths for scanning a
foreign table as well as a plain table, though - so any RTE_RELATION
but not other types of RTE.

> It came from the discussion I had long time before during patch
> reviewing of postgres_fdw. I suggested to use static table of
> FdwRoutine but I got a point that says some compiler raise
> error/warning to put function pointers on static initialization.
> I usually use GCC only, so I'm not sure whether this argue is
> right or not, even though the postgres_fdw_handler() allocates
> FdwRoutine using palloc() then put function pointers for each.

That's odd, because aset.c has used static initializers since forever,
and I'm sure someone would have complained by now if there were a
problem with that usage.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

04 September 2014, 23:58:26

> On Sun, Aug 31, 2014 at 12:54 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> > 2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas@gmail.com>:
> >> Comments:
> >>
> >> 1. There seems to be no reason for custom plan nodes to have
> >> MultiExec support; I think this as an area where extensibility is
> >> extremely unlikely to work out.  The MultiExec mechanism is really
> >> only viable between closely-cooperating nodes, like Hash and
> >> HashJoin, or BitmapIndexScan, BitmapAnd, BitmapOr, and
> >> BitmapHeapScan; and arguably those things could have been written as
> a single, more complex node.
> >> Are we really going to want to support a custom plan that can
> >> substitute for a Hash or BitmapAnd node?  I really doubt that's very
> >> useful.
> >>
> > This intends to allows a particular custom-scan provider to exchange
> > its internal data when multiple custom-scan node is stacked.
> > So, it can be considered a facility to implement closely-cooperating
> > nodes; both of them are managed by same custom-scan provider.
> > An example is gpu-accelerated version of hash-join that takes
> > underlying custom-scan node that will returns a hash table with gpu
> > preferable data structure, but should not be a part of row-by-row
> interface.
> > I believe it is valuable for some use cases, even though I couldn't
> > find a use-case in ctidscan example.
> 
> Color me skeptical.  Please remove that part for now, and we can revisit
> it when, and if, a plausible use case emerges.
> 
Now, I removed the multi-exec portion from the patch set.

Existence of this interface affects to the query execution cost so much,
so I want to revisit it as soon as possible. Also see the EXPLAIN output
on the tail of this message.

> > It came from the discussion I had long time before during patch
> > reviewing of postgres_fdw. I suggested to use static table of
> > FdwRoutine but I got a point that says some compiler raise
> > error/warning to put function pointers on static initialization.
> > I usually use GCC only, so I'm not sure whether this argue is right or
> > not, even though the postgres_fdw_handler() allocates FdwRoutine using
> > palloc() then put function pointers for each.
> 
> That's odd, because aset.c has used static initializers since forever, and
> I'm sure someone would have complained by now if there were a problem with
> that usage.
> 
I reminded the discussion at that time. The GCC specific manner was not
static initialization itself, it was static initialization with field name.
Like:
  static CustomPathMethods   ctidscan_path_methods = {
      .CustomName = "ctidscan",
      .CreateCustomPlan = CreateCtidScanPlan,
      .TextOutCustomPath = TextOutCtidScanPath,
  };


Regarding to the attached three patches:
[1] custom-path and hook
It adds register_custom_path_provider() interface for registration of
custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist
to offer alternative scan path on regular relations.
I may need to explain the terms in use. I calls the path-node custom-path
that is the previous step of population of plan-node (like custom-scan
and potentially custom-join and so on). The node object created by
CreateCustomPlan() is called custom-plan because it is abstraction for
all the potential custom-xxx node; custom-scan is the first of all.

[2] custom-scan node
It adds custom-scan node support. The custom-scan node is expected to
generate contents of a particular relation or sub-plan according to its
custom-logic.
Custom-scan provider needs to implement callbacks of CustomScanMethods
and CustomExecMethods. Once a custom-scan node is populated from
custom-path node, the backend calls back these methods in the planning
and execution stage.

[3] contrib/ctidscan
It adds a logic to scan a base relation if WHERE clause contains
inequality expression around ctid system column; that allows to skip
blocks which will be unread obviously.

During the refactoring, I noticed a few interface is omissible.
The backend can know which relation is the target of custom-scan node
being appeared in the plan-tree if its scanrelid > 0. So, I thought
ExplainCustomPlanTargetRel() and ExplainCustomPreScanNode() are
omissible, then removed from the patch.

Please check the attached ones.

--------
Also, regarding to the use-case of multi-exec interface.
Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has
two sub-plans; GpuScan and MultiHash.
GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize
multi-exec interface for more efficient data exchange between the nodes.
GpuScan already keeps a data structure that is suitable to send to/recv from
GPU devices and constructed on the memory segment being DMA available.
If we have to form a tuple, pass it via row-by-row interface, then deform it,
it will become a major performance degradation in this use case.

postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Custom (GpuHashJoin)  (cost=10979.56..90064.15 rows=333 width=49)
   pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid]
   hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid))
   ->  Custom (GpuScan) on t10  (cost=10000.00..88831.26 rows=3333327 width=16)
         Host References: aid, bid, x
         Device References: x
         Device Filter: (x < 10::double precision)
   ->  Custom (MultiHash)  (cost=464.56..464.56 rows=1000 width=41)
         hash keys: aid, bid
         ->  Hash Join  (cost=60.06..464.56 rows=1000 width=41)
               Hash Cond: (t9.data = t8.data)
               ->  Index Scan using t9_pkey on t9  (cost=0.29..357.29 rows=10000 width=37)
               ->  Hash  (cost=47.27..47.27 rows=1000 width=37)
                     ->  Index Scan using t8_pkey on t8  (cost=0.28..47.27 rows=1000 width=37)
 Planning time: 0.810 ms
(15 rows)

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

08 September 2014, 16:40:11

On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Regarding to the attached three patches:
> [1] custom-path and hook
> It adds register_custom_path_provider() interface for registration of
> custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist
> to offer alternative scan path on regular relations.
> I may need to explain the terms in use. I calls the path-node custom-path
> that is the previous step of population of plan-node (like custom-scan
> and potentially custom-join and so on). The node object created by
> CreateCustomPlan() is called custom-plan because it is abstraction for
> all the potential custom-xxx node; custom-scan is the first of all.

I don't think it's a good thing that add_custom_path_type is declared
as void (*)(void *) rather than having a real type.  I suggest we add
the path-creation callback function to CustomPlanMethods instead, like
this:

void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel,
RangeTblEntry *rte);

Then, register_custom_path_provider() can just take CustomPathMethods
* as an argument; and create_customscan_paths can just walk the list
of CustomPlanMethods objects and call CreateCustomScanPath for each
one where that is non-NULL.  This conflates the path generation
mechanism with the type of path getting generated a little bit, but I
don't see any real downside to that.  I don't see a reason why you'd
ever want two different providers to offer the same type of
custompath.

Don't the changes to src/backend/optimizer/plan/createplan.c belong in patch #2?

> [2] custom-scan node
> It adds custom-scan node support. The custom-scan node is expected to
> generate contents of a particular relation or sub-plan according to its
> custom-logic.
> Custom-scan provider needs to implement callbacks of CustomScanMethods
> and CustomExecMethods. Once a custom-scan node is populated from
> custom-path node, the backend calls back these methods in the planning
> and execution stage.

It looks to me like this patch is full of holdovers from its earlier
life as a more-generic CustomPlan node.  In particular, it contains
numerous defenses against the case where scanrelid != 0.  These are
confusingly written as scanrelid > 0, but I think really they're just
bogus altogether: if this is specifically a CustomScan, not a
CustomPlan, then the relid should always be filled in.  Please
consider what can be simplified here.

The comment in _copyCustomScan looks bogus to me.  I think we should
*require* a static method table.

In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of
stuff; } else elog(ERROR, ...).  I think it would be clearer to write
if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff;

> Also, regarding to the use-case of multi-exec interface.
> Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has
> two sub-plans; GpuScan and MultiHash.
> GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize
> multi-exec interface for more efficient data exchange between the nodes.
> GpuScan already keeps a data structure that is suitable to send to/recv from
> GPU devices and constructed on the memory segment being DMA available.
> If we have to form a tuple, pass it via row-by-row interface, then deform it,
> it will become a major performance degradation in this use case.
>
> postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10;
>                                           QUERY PLAN
> -----------------------------------------------------------------------------------------------
>  Custom (GpuHashJoin)  (cost=10979.56..90064.15 rows=333 width=49)
>    pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid]
>    hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid))
>    ->  Custom (GpuScan) on t10  (cost=10000.00..88831.26 rows=3333327 width=16)
>          Host References: aid, bid, x
>          Device References: x
>          Device Filter: (x < 10::double precision)
>    ->  Custom (MultiHash)  (cost=464.56..464.56 rows=1000 width=41)
>          hash keys: aid, bid
>          ->  Hash Join  (cost=60.06..464.56 rows=1000 width=41)
>                Hash Cond: (t9.data = t8.data)
>                ->  Index Scan using t9_pkey on t9  (cost=0.29..357.29 rows=10000 width=37)
>                ->  Hash  (cost=47.27..47.27 rows=1000 width=37)
>                      ->  Index Scan using t8_pkey on t8  (cost=0.28..47.27 rows=1000 width=37)
>  Planning time: 0.810 ms
> (15 rows)

Why can't the Custom(GpuHashJoin) node build the hash table internally
instead of using a separate node?

Also, for this patch we are only considering custom scan.  Custom join
is another patch.  We don't need to provide infrastructure for that
patch in this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

11 September 2014, 15:25:00

> On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Regarding to the attached three patches:
> > [1] custom-path and hook
> > It adds register_custom_path_provider() interface for registration 
> > of custom-path entrypoint. Callbacks are invoked on 
> > set_plain_rel_pathlist to offer alternative scan path on regular
> relations.
> > I may need to explain the terms in use. I calls the path-node 
> > custom-path that is the previous step of population of plan-node 
> > (like custom-scan and potentially custom-join and so on). The node 
> > object created by
> > CreateCustomPlan() is called custom-plan because it is abstraction 
> > for all the potential custom-xxx node; custom-scan is the first of all.
> 
> I don't think it's a good thing that add_custom_path_type is declared 
> as void (*)(void *) rather than having a real type.  I suggest we add 
> the path-creation callback function to CustomPlanMethods instead, like
> this:
> 
> void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel, 
> RangeTblEntry *rte);
> 
> Then, register_custom_path_provider() can just take CustomPathMethods
> * as an argument; and create_customscan_paths can just walk the list 
> of CustomPlanMethods objects and call CreateCustomScanPath for each 
> one where that is non-NULL.  This conflates the path generation 
> mechanism with the type of path getting generated a little bit, but I 
> don't see any real downside to that.  I don't see a reason why you'd 
> ever want two different providers to offer the same type of custompath.
> 
It seems to me the design you suggested is smarter than the original one.
The first patch was revised according to the design.

> Don't the changes to src/backend/optimizer/plan/createplan.c belong in 
> patch #2?
> 
The borderline between #1 and #2 is little bit bogus. So, I moved most of
portion into #1, however, invocation of InitCustomScan (that is a callback
in CustomPlanMethod) in create_custom_plan() is still in #2.

> > [2] custom-scan node
> > It adds custom-scan node support. The custom-scan node is expected 
> > to generate contents of a particular relation or sub-plan according 
> > to its custom-logic.
> > Custom-scan provider needs to implement callbacks of 
> > CustomScanMethods and CustomExecMethods. Once a custom-scan node is 
> > populated from custom-path node, the backend calls back these 
> > methods in the planning and execution stage.
> 
> It looks to me like this patch is full of holdovers from its earlier 
> life as a more-generic CustomPlan node.  In particular, it contains 
> numerous defenses against the case where scanrelid != 0.  These are 
> confusingly written as scanrelid > 0, but I think really they're just bogus altogether:
> if this is specifically a CustomScan, not a CustomPlan, then the relid 
> should always be filled in.  Please consider what can be simplified here.
> 
OK, I revised. Now custom-scan assumes it has a particular valid relation
to be scanned, so no code path with scanrelid == 0 at this moment.

Let us revisit this scenario when custom-scan replaces relation-joins.
In this case, custom-scan will not be associated with a particular base-
relation, thus it needs to admit a custom-scan node with scanrelid == 0.

> The comment in _copyCustomScan looks bogus to me.  I think we should
> *require* a static method table.
> 
OK, it was fixed to copy the pointer of function table; not table itself.

> In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of 
> stuff; } else elog(ERROR, ...).  I think it would be clearer to write 
> if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff;
> 
Fixed.

> > Also, regarding to the use-case of multi-exec interface.
> > Below is an EXPLAIN output of PG-Strom. It shows the custom 
> > GpuHashJoin has two sub-plans; GpuScan and MultiHash.
> > GpuHashJoin is stacked on the GpuScan. It is a case when these nodes 
> > utilize multi-exec interface for more efficient data exchange 
> > between
> the nodes.
> > GpuScan already keeps a data structure that is suitable to send 
> > to/recv from GPU devices and constructed on the memory segment being 
> > DMA
> available.
> > If we have to form a tuple, pass it via row-by-row interface, then 
> > deform it, it will become a major performance degradation in this 
> > use
> case.
> >
> > postgres=# explain select * from t10 natural join t8 natural join t9 
> > where
> x < 10;
> >                                           QUERY PLAN
> >
> ----------------------------------------------------------------------
> > -------------------------  Custom (GpuHashJoin)
> > (cost=10979.56..90064.15 rows=333 width=49)
> >    pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 
> > 2:<t8.data>,
> 5:[t8.aid], 6:[t9.bid]
> >    hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid))
> >    ->  Custom (GpuScan) on t10  (cost=10000.00..88831.26
> > rows=3333327
> width=16)
> >          Host References: aid, bid, x
> >          Device References: x
> >          Device Filter: (x < 10::double precision)
> >    ->  Custom (MultiHash)  (cost=464.56..464.56 rows=1000 width=41)
> >          hash keys: aid, bid
> >          ->  Hash Join  (cost=60.06..464.56 rows=1000 width=41)
> >                Hash Cond: (t9.data = t8.data)
> >                ->  Index Scan using t9_pkey on t9
> > (cost=0.29..357.29
> rows=10000 width=37)
> >                ->  Hash  (cost=47.27..47.27 rows=1000 width=37)
> >                      ->  Index Scan using t8_pkey on t8
> > (cost=0.28..47.27 rows=1000 width=37)  Planning time: 0.810 ms
> > (15 rows)
> 
> Why can't the Custom(GpuHashJoin) node build the hash table internally 
> instead of using a separate node?
> 
It's possible, however, it prevents to check sub-plans using EXPLAIN if we
manage inner-plans internally. So, I'd like to have a separate node being
connected to the inner-plan.

> Also, for this patch we are only considering custom scan.  Custom join 
> is another patch.  We don't need to provide infrastructure for that 
> patch in this one.
> 
OK, let me revisit it on the next stage, with functionalities above.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

11 September 2014, 20:07:54

On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> Don't the changes to src/backend/optimizer/plan/createplan.c belong in
>> patch #2?
>>
> The borderline between #1 and #2 is little bit bogus. So, I moved most of
> portion into #1, however, invocation of InitCustomScan (that is a callback
> in CustomPlanMethod) in create_custom_plan() is still in #2.

Eh, create_custom_scan() certainly looks like it is in #1 from here,
or at least part of it is.  It calculates tlist and clauses and then
does nothing with them.  That clearly can't be the right division.

I think it would make sense to have create_custom_scan() compute tlist
and clauses first, and then pass those to CreateCustomPlan().  Then
you don't need a separate InitCustomScan() - which is misnamed anyway,
since it has nothing to do with ExecInitCustomScan().

> OK, I revised. Now custom-scan assumes it has a particular valid relation
> to be scanned, so no code path with scanrelid == 0 at this moment.
>
> Let us revisit this scenario when custom-scan replaces relation-joins.
> In this case, custom-scan will not be associated with a particular base-
> relation, thus it needs to admit a custom-scan node with scanrelid == 0.

Yeah, I guess the question there is whether we'll want let CustomScan
have scanrelid == 0 or require that CustomJoin be used there instead.

>> Why can't the Custom(GpuHashJoin) node build the hash table internally
>> instead of using a separate node?
>>
> It's possible, however, it prevents to check sub-plans using EXPLAIN if we
> manage inner-plans internally. So, I'd like to have a separate node being
> connected to the inner-plan.

Isn't that just a matter of letting the EXPLAIN code print more stuff?Why can't it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

12 September 2014, 00:41:16

> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
> wrote:
> >> Don't the changes to src/backend/optimizer/plan/createplan.c belong
> >> in patch #2?
> >>
> > The borderline between #1 and #2 is little bit bogus. So, I moved most
> > of portion into #1, however, invocation of InitCustomScan (that is a
> > callback in CustomPlanMethod) in create_custom_plan() is still in #2.
> 
> Eh, create_custom_scan() certainly looks like it is in #1 from here, or
> at least part of it is.  It calculates tlist and clauses and then does
> nothing with them.  That clearly can't be the right division.
> 
> I think it would make sense to have create_custom_scan() compute tlist and
> clauses first, and then pass those to CreateCustomPlan().  Then you don't
> need a separate InitCustomScan() - which is misnamed anyway, since it has
> nothing to do with ExecInitCustomScan().
> 
The only reason why I put separate hooks here is, create_custom_scan() needs
to know exact size of the CustomScan node (including private fields), however,
it is helpful for extensions to kick its callback to initialize the node
next to the common initialization stuff.

If we have a static field to inform exact size of the data-type inherited
from the CustomScan in CustomPathMethod, it may be able to eliminate the
CreateCustomPlan(). One downside is, extension needs to register multiple
CustomPath table for each custom-plan node to be populated later.
So, my preference is the current design rather than static approach.

Regarding to the naming, how about GetCustomScan() instead of InitCustomScan()?
It follows the manner in create_foreignscan_plan().


> > OK, I revised. Now custom-scan assumes it has a particular valid
> > relation to be scanned, so no code path with scanrelid == 0 at this moment.
> >
> > Let us revisit this scenario when custom-scan replaces relation-joins.
> > In this case, custom-scan will not be associated with a particular
> > base- relation, thus it needs to admit a custom-scan node with scanrelid
> == 0.
> 
> Yeah, I guess the question there is whether we'll want let CustomScan have
> scanrelid == 0 or require that CustomJoin be used there instead.
> 
Right now, I cannot imagine a use case that requires individual CustomJoin
node because CustomScan with scanrelid==0 (that performs like custom-plan
rather than custom-scan in actually) is sufficient.

If a CustomScan gets chosen instead of built-in join logics, it shall looks
like a relation scan on the virtual one that is consists of two underlying
relation. Callbacks of the CustomScan has a responsibility to join underlying
relations; that is invisible from the core executor.

It seems to me CustomScan with scanrelid==0 is sufficient to implement
an alternative logic on relation joins, don't need an individual node
from the standpoint of executor.

> >> Why can't the Custom(GpuHashJoin) node build the hash table
> >> internally instead of using a separate node?
> >>
> > It's possible, however, it prevents to check sub-plans using EXPLAIN
> > if we manage inner-plans internally. So, I'd like to have a separate
> > node being connected to the inner-plan.
> 
> Isn't that just a matter of letting the EXPLAIN code print more stuff?
> Why can't it?
> 
My GpuHashJoin takes multiple relations to load them a hash-table.
On the other hand, Plan node can have two underlying relations at most
(inner/outer). Outer-side is occupied by the larger relation, so it
needs to make multiple relations visible using inner-branch.
If CustomScan can has a list of multiple underlying plan-nodes, like
Append node, it can represent the structure above in straightforward
way, but I'm uncertain which is the better design.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

12 September 2014, 17:41:40

On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
>> wrote:
>> >> Don't the changes to src/backend/optimizer/plan/createplan.c belong
>> >> in patch #2?
>> >>
>> > The borderline between #1 and #2 is little bit bogus. So, I moved most
>> > of portion into #1, however, invocation of InitCustomScan (that is a
>> > callback in CustomPlanMethod) in create_custom_plan() is still in #2.
>>
>> Eh, create_custom_scan() certainly looks like it is in #1 from here, or
>> at least part of it is.  It calculates tlist and clauses and then does
>> nothing with them.  That clearly can't be the right division.
>>
>> I think it would make sense to have create_custom_scan() compute tlist and
>> clauses first, and then pass those to CreateCustomPlan().  Then you don't
>> need a separate InitCustomScan() - which is misnamed anyway, since it has
>> nothing to do with ExecInitCustomScan().
>>
> The only reason why I put separate hooks here is, create_custom_scan() needs
> to know exact size of the CustomScan node (including private fields), however,
> it is helpful for extensions to kick its callback to initialize the node
> next to the common initialization stuff.

Why does it need to know that?  I don't see that it's doing anything
that requires knowing the size of that node, and if it is, I think it
shouldn't be.  That should get delegated to the callback provided by
the custom plan provider.

> Regarding to the naming, how about GetCustomScan() instead of InitCustomScan()?
> It follows the manner in create_foreignscan_plan().

I guess that's a bit better, but come to think of it, I'd really like
to avoid baking in the assumption that the custom path provider has to
return any particular type of plan node.  A good start would be to
give it a name that doesn't imply that - e.g. PlanCustomPath().

>> > OK, I revised. Now custom-scan assumes it has a particular valid
>> > relation to be scanned, so no code path with scanrelid == 0 at this moment.
>> >
>> > Let us revisit this scenario when custom-scan replaces relation-joins.
>> > In this case, custom-scan will not be associated with a particular
>> > base- relation, thus it needs to admit a custom-scan node with scanrelid
>> == 0.
>>
>> Yeah, I guess the question there is whether we'll want let CustomScan have
>> scanrelid == 0 or require that CustomJoin be used there instead.
>>
> Right now, I cannot imagine a use case that requires individual CustomJoin
> node because CustomScan with scanrelid==0 (that performs like custom-plan
> rather than custom-scan in actually) is sufficient.
>
> If a CustomScan gets chosen instead of built-in join logics, it shall looks
> like a relation scan on the virtual one that is consists of two underlying
> relation. Callbacks of the CustomScan has a responsibility to join underlying
> relations; that is invisible from the core executor.
>
> It seems to me CustomScan with scanrelid==0 is sufficient to implement
> an alternative logic on relation joins, don't need an individual node
> from the standpoint of executor.

That's valid logic, but it's not the only way to do it.  If we have
CustomScan and CustomJoin, either of them will require some adaption
to handle this case.  We can either allow a custom scan that isn't
scanning any particular relation (i.e. scanrelid == 0), or we can
allow a custom join that has no children.  I don't know which way will
come out cleaner, and I think it's good to leave that decision to one
side for now.

>> >> Why can't the Custom(GpuHashJoin) node build the hash table
>> >> internally instead of using a separate node?
>> >>
>> > It's possible, however, it prevents to check sub-plans using EXPLAIN
>> > if we manage inner-plans internally. So, I'd like to have a separate
>> > node being connected to the inner-plan.
>>
>> Isn't that just a matter of letting the EXPLAIN code print more stuff?
>> Why can't it?
>>
> My GpuHashJoin takes multiple relations to load them a hash-table.
> On the other hand, Plan node can have two underlying relations at most
> (inner/outer). Outer-side is occupied by the larger relation, so it
> needs to make multiple relations visible using inner-branch.
> If CustomScan can has a list of multiple underlying plan-nodes, like
> Append node, it can represent the structure above in straightforward
> way, but I'm uncertain which is the better design.

Right.  I think the key point is that it is *possible* to make this
work without a multiexec interface, and it seems like we're agreed
that it is.  Now perhaps we will decide that there is enough benefit
in having multiexec support that we want to do it anyway, but it's
clearly not a hard requirement, because it can be done without that in
the way you describe here.  Let's leave to the future the decision as
to how to proceed here; getting the basic thing done is hard enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

15 September 2014, 12:39:12

> On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai
> >> <kaigai@ak.jp.nec.com>
> >> wrote:
> >> >> Don't the changes to src/backend/optimizer/plan/createplan.c
> >> >> belong in patch #2?
> >> >>
> >> > The borderline between #1 and #2 is little bit bogus. So, I moved
> >> > most of portion into #1, however, invocation of InitCustomScan
> >> > (that is a callback in CustomPlanMethod) in create_custom_plan() is
> still in #2.
> >>
> >> Eh, create_custom_scan() certainly looks like it is in #1 from here,
> >> or at least part of it is.  It calculates tlist and clauses and then
> >> does nothing with them.  That clearly can't be the right division.
> >>
> >> I think it would make sense to have create_custom_scan() compute
> >> tlist and clauses first, and then pass those to CreateCustomPlan().
> >> Then you don't need a separate InitCustomScan() - which is misnamed
> >> anyway, since it has nothing to do with ExecInitCustomScan().
> >>
> > The only reason why I put separate hooks here is, create_custom_scan()
> > needs to know exact size of the CustomScan node (including private
> > fields), however, it is helpful for extensions to kick its callback to
> > initialize the node next to the common initialization stuff.
> 
> Why does it need to know that?  I don't see that it's doing anything that
> requires knowing the size of that node, and if it is, I think it shouldn't
> be.  That should get delegated to the callback provided by the custom plan
> provider.
> 
Sorry, my explanation might be confusable. The create_custom_scan() does not
need to know the exact size of the CustomScan (or its inheritance) because of
the two separated hooks; one is node allocation time, the other is the tail
of the series of initialization.
If we have only one hook here, we need to have a mechanism to informs
create_custom_scan() an exact size of the CustomScan node; including private
fields managed by the provider, instead of the first hook on node allocation
time. In this case, node allocation shall be processed by create_custom_scan()
and it has to know exact size of the node to be allocated.

How do I implement the feature here? Is the combination of static node size
and callback on the tail more simple than the existing design that takes two
individual hooks on create_custom_scan()?

> > Regarding to the naming, how about GetCustomScan() instead of
> InitCustomScan()?
> > It follows the manner in create_foreignscan_plan().
> 
> I guess that's a bit better, but come to think of it, I'd really like to
> avoid baking in the assumption that the custom path provider has to return
> any particular type of plan node.  A good start would be to give it a name
> that doesn't imply that - e.g. PlanCustomPath().
> 
OK, I'll use this naming.

> >> > OK, I revised. Now custom-scan assumes it has a particular valid
> >> > relation to be scanned, so no code path with scanrelid == 0 at this
> moment.
> >> >
> >> > Let us revisit this scenario when custom-scan replaces relation-joins.
> >> > In this case, custom-scan will not be associated with a particular
> >> > base- relation, thus it needs to admit a custom-scan node with
> >> > scanrelid
> >> == 0.
> >>
> >> Yeah, I guess the question there is whether we'll want let CustomScan
> >> have scanrelid == 0 or require that CustomJoin be used there instead.
> >>
> > Right now, I cannot imagine a use case that requires individual
> > CustomJoin node because CustomScan with scanrelid==0 (that performs
> > like custom-plan rather than custom-scan in actually) is sufficient.
> >
> > If a CustomScan gets chosen instead of built-in join logics, it shall
> > looks like a relation scan on the virtual one that is consists of two
> > underlying relation. Callbacks of the CustomScan has a responsibility
> > to join underlying relations; that is invisible from the core executor.
> >
> > It seems to me CustomScan with scanrelid==0 is sufficient to implement
> > an alternative logic on relation joins, don't need an individual node
> > from the standpoint of executor.
> 
> That's valid logic, but it's not the only way to do it.  If we have CustomScan
> and CustomJoin, either of them will require some adaption to handle this
> case.  We can either allow a custom scan that isn't scanning any particular
> relation (i.e. scanrelid == 0), or we can allow a custom join that has no
> children.  I don't know which way will come out cleaner, and I think it's
> good to leave that decision to one side for now.
> 
Yep. I agree with you. It may not be productive discussion to conclude
this design topic right now. Let's assume CustomScan scans on a particular
relation (scanrelid != 0) on the first revision.

> >> >> Why can't the Custom(GpuHashJoin) node build the hash table
> >> >> internally instead of using a separate node?
> >> >>
> >> > It's possible, however, it prevents to check sub-plans using
> >> > EXPLAIN if we manage inner-plans internally. So, I'd like to have a
> >> > separate node being connected to the inner-plan.
> >>
> >> Isn't that just a matter of letting the EXPLAIN code print more stuff?
> >> Why can't it?
> >>
> > My GpuHashJoin takes multiple relations to load them a hash-table.
> > On the other hand, Plan node can have two underlying relations at most
> > (inner/outer). Outer-side is occupied by the larger relation, so it
> > needs to make multiple relations visible using inner-branch.
> > If CustomScan can has a list of multiple underlying plan-nodes, like
> > Append node, it can represent the structure above in straightforward
> > way, but I'm uncertain which is the better design.
> 
> Right.  I think the key point is that it is *possible* to make this work
> without a multiexec interface, and it seems like we're agreed that it is.
> Now perhaps we will decide that there is enough benefit in having multiexec
> support that we want to do it anyway, but it's clearly not a hard requirement,
> because it can be done without that in the way you describe here.  Let's
> leave to the future the decision as to how to proceed here; getting the
> basic thing done is hard enough.
> 
OK, let's postpone the discussion on the custom-join support.
Either of approaches (1. multi-exec support, or 2. multiple subplans like
Append) is sufficient for this purpose, and the multi-exec interface is
a way to implement it, not a goal.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

15 September 2014, 17:30:59

On Mon, Sep 15, 2014 at 8:38 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> > The only reason why I put separate hooks here is, create_custom_scan()
>> > needs to know exact size of the CustomScan node (including private
>> > fields), however, it is helpful for extensions to kick its callback to
>> > initialize the node next to the common initialization stuff.
>>
>> Why does it need to know that?  I don't see that it's doing anything that
>> requires knowing the size of that node, and if it is, I think it shouldn't
>> be.  That should get delegated to the callback provided by the custom plan
>> provider.
>>
> Sorry, my explanation might be confusable. The create_custom_scan() does not
> need to know the exact size of the CustomScan (or its inheritance) because of
> the two separated hooks; one is node allocation time, the other is the tail
> of the series of initialization.
> If we have only one hook here, we need to have a mechanism to informs
> create_custom_scan() an exact size of the CustomScan node; including private
> fields managed by the provider, instead of the first hook on node allocation
> time. In this case, node allocation shall be processed by create_custom_scan()
> and it has to know exact size of the node to be allocated.
>
> How do I implement the feature here? Is the combination of static node size
> and callback on the tail more simple than the existing design that takes two
> individual hooks on create_custom_scan()?

I still don't get it.  Right now, the logic in create_custom_scan(),
which I think should really be create_custom_plan() or
create_plan_from_custom_path(), basically looks like this:

1. call hook function CreateCustomPlan
2. compute values for tlist and clauses
3. pass those values to hook function InitCustomScan()
4. call copy_path_costsize

What I think we should do is:

1. compute values for tlist and clauses
2. pass those values to hook function PlanCustomPath(), which will return a Plan
3. call copy_path_costsize

create_custom_scan() does not need to allocate the node.  You don't
need the node to be allocated before computing tlist and clauses.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

16 September 2014, 11:50:49

> On Mon, Sep 15, 2014 at 8:38 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> > The only reason why I put separate hooks here is,
> >> > create_custom_scan() needs to know exact size of the CustomScan
> >> > node (including private fields), however, it is helpful for
> >> > extensions to kick its callback to initialize the node next to the
> common initialization stuff.
> >>
> >> Why does it need to know that?  I don't see that it's doing anything
> >> that requires knowing the size of that node, and if it is, I think it
> >> shouldn't be.  That should get delegated to the callback provided by
> >> the custom plan provider.
> >>
> > Sorry, my explanation might be confusable. The create_custom_scan()
> > does not need to know the exact size of the CustomScan (or its
> > inheritance) because of the two separated hooks; one is node
> > allocation time, the other is the tail of the series of initialization.
> > If we have only one hook here, we need to have a mechanism to informs
> > create_custom_scan() an exact size of the CustomScan node; including
> > private fields managed by the provider, instead of the first hook on
> > node allocation time. In this case, node allocation shall be processed
> > by create_custom_scan() and it has to know exact size of the node to be
> allocated.
> >
> > How do I implement the feature here? Is the combination of static node
> > size and callback on the tail more simple than the existing design
> > that takes two individual hooks on create_custom_scan()?
> 
> I still don't get it.  Right now, the logic in create_custom_scan(), which
> I think should really be create_custom_plan() or
> create_plan_from_custom_path(), basically looks like this:
> 
> 1. call hook function CreateCustomPlan
> 2. compute values for tlist and clauses
> 3. pass those values to hook function InitCustomScan() 4. call
> copy_path_costsize
> 
> What I think we should do is:
> 
> 1. compute values for tlist and clauses
> 2. pass those values to hook function PlanCustomPath(), which will return
> a Plan 3. call copy_path_costsize
> 
> create_custom_scan() does not need to allocate the node.  You don't need
> the node to be allocated before computing tlist and clauses.
> 
Thanks, I could get the point.
I'll revise the patch according to the suggestion above.

It seems to me, we can also apply similar manner on ExecInitCustomScan().
The current implementation doing is:
1. call CreateCustomScanState() to allocate a CustomScanState node
2. common initialization of the fields on CustomScanState, but not private  fields.
3. call BeginCustomScan() to initialize remaining stuffs and begin execution.

If BeginCustomScan() is re-defined to accept values for common initialization
portions and to return a CustomScanState node, we may be able to eliminate
the CreateCustomScanState() hook.

Unlike create_custom_scan() case, it takes more number of values for common
initialization portions; expression tree of tlist and quals, scan and result
tuple-slot, projection info and relation handler. It may mess up the interface
specification.
In addition, BeginCustomScan() has to belong to CustomScanMethods, not
CustomexecMethods. I'm uncertain whether it is straightforward location.
(a whisper: It may not need to be separate tables. CustomScan always
populates CustomScanState, unlike relationship between CustomPath and
CustomScan.)

How about your opinion to apply the above manner on ExecInitCustomScan()
also?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

17 September 2014, 23:41:20

> > >> Why does it need to know that?  I don't see that it's doing
> > >> anything that requires knowing the size of that node, and if it is,
> > >> I think it shouldn't be.  That should get delegated to the callback
> > >> provided by the custom plan provider.
> > >>
> > > Sorry, my explanation might be confusable. The create_custom_scan()
> > > does not need to know the exact size of the CustomScan (or its
> > > inheritance) because of the two separated hooks; one is node
> > > allocation time, the other is the tail of the series of initialization.
> > > If we have only one hook here, we need to have a mechanism to
> > > informs
> > > create_custom_scan() an exact size of the CustomScan node; including
> > > private fields managed by the provider, instead of the first hook on
> > > node allocation time. In this case, node allocation shall be
> > > processed by create_custom_scan() and it has to know exact size of
> > > the node to be
> > allocated.
> > >
> > > How do I implement the feature here? Is the combination of static
> > > node size and callback on the tail more simple than the existing
> > > design that takes two individual hooks on create_custom_scan()?
> >
> > I still don't get it.  Right now, the logic in create_custom_scan(),
> > which I think should really be create_custom_plan() or
> > create_plan_from_custom_path(), basically looks like this:
> >
> > 1. call hook function CreateCustomPlan 2. compute values for tlist and
> > clauses 3. pass those values to hook function InitCustomScan() 4. call
> > copy_path_costsize
> >
> > What I think we should do is:
> >
> > 1. compute values for tlist and clauses 2. pass those values to hook
> > function PlanCustomPath(), which will return a Plan 3. call
> > copy_path_costsize
> >
> > create_custom_scan() does not need to allocate the node.  You don't
> > need the node to be allocated before computing tlist and clauses.
> >
> Thanks, I could get the point.
> I'll revise the patch according to the suggestion above.
> 
At this moment, I revised the above portion of the patches.
create_custom_plan() was modified to call "PlanCustomPath" callback
next to the initialization of tlist and clauses.

It's probably same as what you suggested.

> It seems to me, we can also apply similar manner on ExecInitCustomScan().
> The current implementation doing is:
> 1. call CreateCustomScanState() to allocate a CustomScanState node 2.
> common initialization of the fields on CustomScanState, but not private
>    fields.
> 3. call BeginCustomScan() to initialize remaining stuffs and begin
> execution.
> 
> If BeginCustomScan() is re-defined to accept values for common
> initialization portions and to return a CustomScanState node, we may be
> able to eliminate the CreateCustomScanState() hook.
> 
> Unlike create_custom_scan() case, it takes more number of values for common
> initialization portions; expression tree of tlist and quals, scan and result
> tuple-slot, projection info and relation handler. It may mess up the
> interface specification.
> In addition, BeginCustomScan() has to belong to CustomScanMethods, not
> CustomexecMethods. I'm uncertain whether it is straightforward location.
> (a whisper: It may not need to be separate tables. CustomScan always
> populates CustomScanState, unlike relationship between CustomPath and
> CustomScan.)
> 
> How about your opinion to apply the above manner on ExecInitCustomScan()
> also?
> 
I kept existing implementation around ExecInitCustomScan() right now.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

24 September 2014, 15:24:22

On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> At this moment, I revised the above portion of the patches.
> create_custom_plan() was modified to call "PlanCustomPath" callback
> next to the initialization of tlist and clauses.
>
> It's probably same as what you suggested.

create_custom_plan() is mis-named.  It's actually only applicable to
the custom-scan case, because it's triggered by create_plan_recurse()
getting a path node with a T_CustomScan pathtype.  Now, we could
change that; although in general create_plan_recurse() dispatches on
pathtype, we could make CustomPath an exception; the top of that
function could say if (IsA(best_path, CustomPath)) { /* do custom
stuff */ }.  But the problem with that idea is that, when the custom
path is specifically a custom scan, rather than a join or some other
thing, you want to do all of the same processing that's in
create_scan_plan().

So I think what should happen is that create_plan_recurse() should
handle T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et
al: by calling create_scan_plan().  The switch inside that function
can then call a function create_customscan_plan() if it sees
T_CustomScan.  And that function will be simpler than the
create_custom_plan() that you have now, and it will be named
correctly, too.

In ExplainNode(), I think sname should be set to "Custom Scan", not
"Custom".  And further down, the custom_name should be printed as
"Custom Plan Provider" not just "Custom".

setrefs.c has remaining handling for the scanrelid = 0 case; please remove that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

29 September 2014, 08:49:43

> On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > At this moment, I revised the above portion of the patches.
> > create_custom_plan() was modified to call "PlanCustomPath" callback
> > next to the initialization of tlist and clauses.
> >
> > It's probably same as what you suggested.
> 
> create_custom_plan() is mis-named.  It's actually only applicable to the
> custom-scan case, because it's triggered by create_plan_recurse() getting
> a path node with a T_CustomScan pathtype.  Now, we could change that;
> although in general create_plan_recurse() dispatches on pathtype, we could
> make CustomPath an exception; the top of that function could say if
> (IsA(best_path, CustomPath)) { /* do custom stuff */ }.  But the problem
> with that idea is that, when the custom path is specifically a custom scan,
> rather than a join or some other thing, you want to do all of the same
> processing that's in create_scan_plan().
> 
> So I think what should happen is that create_plan_recurse() should handle
> T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et
> al: by calling create_scan_plan().  The switch inside that function can
> then call a function create_customscan_plan() if it sees T_CustomScan.  And
> that function will be simpler than the
> create_custom_plan() that you have now, and it will be named correctly,
> too.
> 
Fixed, according to what you suggested. It seems to me create_customscan_plan()
became more simplified than before.
Probably, it will minimize the portion of special case handling if CustomScan
with scanrelid==0 replaces built-in join plan in the future version.

> In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom".
> And further down, the custom_name should be printed as "Custom Plan
> Provider" not just "Custom".
> 
Fixed. I added an additional regression test to check EXPLAIN output
if not a text format.

> setrefs.c has remaining handling for the scanrelid = 0 case; please remove
> that.
> 
Sorry, I removed it, and checked the patch again to ensure here is no similar
portions.

Thanks for your reviewing.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [v9.5] Custom Plan API

From

Thom Brown

Date:

29 September 2014, 11:26:49

On 29 September 2014 09:48, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> > At this moment, I revised the above portion of the patches.
>> > create_custom_plan() was modified to call "PlanCustomPath" callback
>> > next to the initialization of tlist and clauses.
>> >
>> > It's probably same as what you suggested.
>>
>> create_custom_plan() is mis-named.  It's actually only applicable to the
>> custom-scan case, because it's triggered by create_plan_recurse() getting
>> a path node with a T_CustomScan pathtype.  Now, we could change that;
>> although in general create_plan_recurse() dispatches on pathtype, we could
>> make CustomPath an exception; the top of that function could say if
>> (IsA(best_path, CustomPath)) { /* do custom stuff */ }.  But the problem
>> with that idea is that, when the custom path is specifically a custom scan,
>> rather than a join or some other thing, you want to do all of the same
>> processing that's in create_scan_plan().
>>
>> So I think what should happen is that create_plan_recurse() should handle
>> T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et
>> al: by calling create_scan_plan().  The switch inside that function can
>> then call a function create_customscan_plan() if it sees T_CustomScan.  And
>> that function will be simpler than the
>> create_custom_plan() that you have now, and it will be named correctly,
>> too.
>>
> Fixed, according to what you suggested. It seems to me create_customscan_plan()
> became more simplified than before.
> Probably, it will minimize the portion of special case handling if CustomScan
> with scanrelid==0 replaces built-in join plan in the future version.
>
>> In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom".
>> And further down, the custom_name should be printed as "Custom Plan
>> Provider" not just "Custom".
>>
> Fixed. I added an additional regression test to check EXPLAIN output
> if not a text format.
>
>> setrefs.c has remaining handling for the scanrelid = 0 case; please remove
>> that.
>>
> Sorry, I removed it, and checked the patch again to ensure here is no similar
> portions.
>
> Thanks for your reviewing.

pgsql-v9.5-custom-scan.part-2.v11.patch

+GetSpecialCustomVar(CustomPlanState *node,
+                    Var *varnode,
+                    PlanState **child_ps);

This doesn't seem to strictly match the actual function:

+GetSpecialCustomVar(PlanState *ps, Var *varnode, PlanState **child_ps)

-- 
Thom

Re: [v9.5] Custom Plan API

From

Kohei KaiGai

Date:

29 September 2014, 13:05:10

2014-09-29 20:26 GMT+09:00 Thom Brown <thom@linux.com>:
> On 29 September 2014 09:48, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>> On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>> > At this moment, I revised the above portion of the patches.
>>> > create_custom_plan() was modified to call "PlanCustomPath" callback
>>> > next to the initialization of tlist and clauses.
>>> >
>>> > It's probably same as what you suggested.
>>>
>>> create_custom_plan() is mis-named.  It's actually only applicable to the
>>> custom-scan case, because it's triggered by create_plan_recurse() getting
>>> a path node with a T_CustomScan pathtype.  Now, we could change that;
>>> although in general create_plan_recurse() dispatches on pathtype, we could
>>> make CustomPath an exception; the top of that function could say if
>>> (IsA(best_path, CustomPath)) { /* do custom stuff */ }.  But the problem
>>> with that idea is that, when the custom path is specifically a custom scan,
>>> rather than a join or some other thing, you want to do all of the same
>>> processing that's in create_scan_plan().
>>>
>>> So I think what should happen is that create_plan_recurse() should handle
>>> T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et
>>> al: by calling create_scan_plan().  The switch inside that function can
>>> then call a function create_customscan_plan() if it sees T_CustomScan.  And
>>> that function will be simpler than the
>>> create_custom_plan() that you have now, and it will be named correctly,
>>> too.
>>>
>> Fixed, according to what you suggested. It seems to me create_customscan_plan()
>> became more simplified than before.
>> Probably, it will minimize the portion of special case handling if CustomScan
>> with scanrelid==0 replaces built-in join plan in the future version.
>>
>>> In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom".
>>> And further down, the custom_name should be printed as "Custom Plan
>>> Provider" not just "Custom".
>>>
>> Fixed. I added an additional regression test to check EXPLAIN output
>> if not a text format.
>>
>>> setrefs.c has remaining handling for the scanrelid = 0 case; please remove
>>> that.
>>>
>> Sorry, I removed it, and checked the patch again to ensure here is no similar
>> portions.
>>
>> Thanks for your reviewing.
>
> pgsql-v9.5-custom-scan.part-2.v11.patch
>
> +GetSpecialCustomVar(CustomPlanState *node,
> +                    Var *varnode,
> +                    PlanState **child_ps);
>
> This doesn't seem to strictly match the actual function:
>
> +GetSpecialCustomVar(PlanState *ps, Var *varnode, PlanState **child_ps)
>
It's more convenient if the first argument is PlanState, because
GetSpecialCustomVar() is called towards all the suspicious special
var-node that might be managed by custom-plan provider.
If we have to ensure its first argument is CustomPlanState on the
caller side, it makes function's invocation more complicated.
Also, the callback portion is called only when PlanState is
CustomPlanState, so it is natural to take CustomPlanState for
argument of the callback interface.
Do we need to match the prototype of wrapper function with callback?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

29 September 2014, 15:07:24

On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> Do we need to match the prototype of wrapper function with callback?

Yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

30 September 2014, 06:28:44

> On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> > Do we need to match the prototype of wrapper function with callback?
> 
> Yes.
> 
OK, I fixed up the patch part-2, to fit declaration of GetSpecialCustomVar()
with corresponding callback.

Also, a noise in the part-3 patch, by git-pull, was removed.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: [v9.5] Custom Plan API

From

Merlin Moncure

Date:

01 October 2014, 15:41:32

On Tue, Jul 8, 2014 at 6:55 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> * Syntax also reflects what the command does more. New syntax to
>   define custom plan provider is:
>     CREATE CUSTOM PLAN PROVIDER <cpp_name>
>       FOR <cpp_class> HANDLER <cpp_function>;

-1 on 'cpp' prefix.  I don't see acronyms used in the syntax
documentation and cpp will make people reflexively think 'c++'.  How
about  <provider_name> and <provider_function>?

merlin

Re: [v9.5] Custom Plan API

From

Kohei KaiGai

Date:

01 October 2014, 21:08:49

2014-10-02 0:41 GMT+09:00 Merlin Moncure <mmoncure@gmail.com>:
> On Tue, Jul 8, 2014 at 6:55 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> * Syntax also reflects what the command does more. New syntax to
>>   define custom plan provider is:
>>     CREATE CUSTOM PLAN PROVIDER <cpp_name>
>>       FOR <cpp_class> HANDLER <cpp_function>;
>
> -1 on 'cpp' prefix.  I don't see acronyms used in the syntax
> documentation and cpp will make people reflexively think 'c++'.  How
> about  <provider_name> and <provider_function>?
>
It is not a living code. I already eliminated the SQL syntax portion,
instead of the internal interface (register_custom_path_provider)
that registers callbacks on extension load time.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: [v9.5] Custom Plan API

From

Thom Brown

Date:

26 October 2014, 12:22:35

On 30 September 2014 07:27, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> > Do we need to match the prototype of wrapper function with callback?
>
> Yes.
>
OK, I fixed up the patch part-2, to fit declaration of GetSpecialCustomVar()
with corresponding callback.

Also, a noise in the part-3 patch, by git-pull, was removed.

FYI, patch v12 part 2 no longer applies cleanly.

--
Thom

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

27 October 2014, 06:36:47

> FYI, patch v12 part 2 no longer applies cleanly.
>
Thanks. I rebased the patch set according to the latest master branch.
The attached v13 can be applied to the master.
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: thombrown@gmail.com [mailto:thombrown@gmail.com] On Behalf Of Thom
> Brown
> Sent: Sunday, October 26, 2014 9:22 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru Hanada;
> Simon Riggs; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter
> Eisentraut
> Subject: Re: [HACKERS] [v9.5] Custom Plan API
> 
> On 30 September 2014 07:27, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> 
> 
>     > On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai
> <kaigai@kaigai.gr.jp> wrote:
>     > > Do we need to match the prototype of wrapper function with
> callback?
>     >
>     > Yes.
>     >
>     OK, I fixed up the patch part-2, to fit declaration of
> GetSpecialCustomVar()
>     with corresponding callback.
> 
>     Also, a noise in the part-3 patch, by git-pull, was removed.
> 
> 
> FYI, patch v12 part 2 no longer applies cleanly.
> 
> --
> Thom

Attachment

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

07 November 2014, 22:46:54

On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> FYI, patch v12 part 2 no longer applies cleanly.
>>
> Thanks. I rebased the patch set according to the latest master branch.
> The attached v13 can be applied to the master.

I've committed parts 1 and 2 of this, without the documentation, and
with some additional cleanup.  I am not sure that this feature is
sufficiently non-experimental that it deserves to be documented, but
if we're thinking of doing that then the documentation needs a lot
more work.  I think part 3 of the patch is mostly useful as a
demonstration of how this API can be used, and is not something we
probably want to commit.  So I'm not planning, at this point, to spend
any more time on this patch series, and will mark it Committed in the
CF app.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

07 November 2014, 23:34:27

> On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> FYI, patch v12 part 2 no longer applies cleanly.
> >>
> > Thanks. I rebased the patch set according to the latest master branch.
> > The attached v13 can be applied to the master.
> 
> I've committed parts 1 and 2 of this, without the documentation, and with
> some additional cleanup.  I am not sure that this feature is sufficiently
> non-experimental that it deserves to be documented, but if we're thinking
> of doing that then the documentation needs a lot more work.  I think part
> 3 of the patch is mostly useful as a demonstration of how this API can be
> used, and is not something we probably want to commit.  So I'm not planning,
> at this point, to spend any more time on this patch series, and will mark
> it Committed in the CF app.
> 
Thanks for your great help.

I and Hanada-san have discussed a further enhancement of that interface
that allows to replace a join by custom-scan; probably, can be utilized
with an extension that runs materialized-view instead of join on the fly.
We will submit a design proposal of this enhancement later.

Best regards,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Amit Kapila

Date:

10 November 2014, 07:37:31

On Sat, Nov 8, 2014 at 4:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> FYI, patch v12 part 2 no longer applies cleanly.
> >>
> > Thanks. I rebased the patch set according to the latest master branch.
> > The attached v13 can be applied to the master.
>
> I've committed parts 1 and 2 of this, without the documentation, and
> with some additional cleanup.

Few observations/questions related to this commit:

@@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)

colinfo = deparse_columns_fetch(var->varno, dpns);

attnum = var->varattno;

}

+ else if (IS_SPECIAL_VARNO(var->varno) &&

+ IsA(dpns->planstate, CustomScanState) &&

+ (expr = GetSpecialCustomVar((CustomScanState *) dpns->planstate,

+ var, &child_ps)) != NULL)

+ {

+ deparse_namespace save_dpns;

+ if (child_ps)

+ push_child_plan(dpns, child_ps, &save_dpns);

+ /*

+ * Force parentheses because our caller probably assumed a Var is a

+ * simple expression.

+ */

+ if (!IsA(expr, Var))

+ appendStringInfoChar(buf, '(');

+ get_rule_expr((Node *) expr, context, true);

+ if (!IsA(expr, Var))

+ appendStringInfoChar(buf, ')');

+ if (child_ps)

+ pop_child_plan(dpns, &save_dpns);

+ return NULL;

+ }

a. It seems Assert for netlelvelsup is missing in this loop.

b. Below comment in function get_variable can be improved

w.r.t handling for CustomScanState. The comment indicates

that if varno is OUTER_VAR or INNER_VAR or INDEX_VAR, it handles

all of them similarly which seems to be slightly changed for

CustomScanState.

* Try to find the relevant RTE in this rtable. In a plan tree, it's

* likely that varno is

OUTER_VAR or INNER_VAR, in which case we must dig

* down into the subplans, or INDEX_VAR, which is

resolved similarly. Also

* find the aliases previously assigned for this RTE.

+void

+register_custom_path_provider(CustomPathMethods *cpp_methods)

{

}

Shouldn't there be unregister function corresponding to above

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

10 November 2014, 10:49:48

> On Sat, Nov 8, 2014 at 4:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
> wrote:
> > >> FYI, patch v12 part 2 no longer applies cleanly.
> > >>
> > > Thanks. I rebased the patch set according to the latest master branch.
> > > The attached v13 can be applied to the master.
> >
> > I've committed parts 1 and 2 of this, without the documentation, and
> > with some additional cleanup.
> 
> Few observations/questions related to this commit:
> 
> 1.
> @@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel,
> deparse_context *context)
>   colinfo = deparse_columns_fetch(var->varno, dpns);
>   attnum = var->varattno;
>   }
> + else if (IS_SPECIAL_VARNO(var->varno) && IsA(dpns->planstate,
> + CustomScanState) && (expr = GetSpecialCustomVar((CustomScanState *)
> + dpns->planstate, var, &child_ps)) != NULL) { deparse_namespace
> + save_dpns;
> +
> + if (child_ps)
> + push_child_plan(dpns, child_ps, &save_dpns);
> + /*
> + * Force parentheses because our caller probably assumed a Var is a
> + * simple expression.
> + */
> + if (!IsA(expr, Var))
> + appendStringInfoChar(buf, '(');
> + get_rule_expr((Node *) expr, context, true); if (!IsA(expr, Var))
> + appendStringInfoChar(buf, ')');
> +
> + if (child_ps)
> + pop_child_plan(dpns, &save_dpns);
> + return NULL;
> + }
> 
> a. It seems Assert for netlelvelsup is missing in this loop.
>
Indeed, this if-block does not have assertion unlike other special-varno.

> b. Below comment in function get_variable can be improved w.r.t handling
> for CustomScanState.  The comment indicates that if varno is OUTER_VAR or
> INNER_VAR or INDEX_VAR, it handles all of them similarly which seems to
> be slightly changed for CustomScanState.
> 
> /*
> * Try to find the relevant RTE in this rtable.  In a plan tree, it's
> * likely that varno is
> OUTER_VAR or INNER_VAR, in which case we must dig
> * down into the subplans, or INDEX_VAR, which is resolved similarly. Also
> * find the aliases previously assigned for this RTE.
> */
> 
I made a small comment that introduces only extension knows the mapping
between these special varno and underlying expression, thus, it queries
providers the expression being tied with this special varnode.
Does it make sense?

> 2.
> +void
> +register_custom_path_provider(CustomPathMethods *cpp_methods)
> {
> ..
> }
> 
> Shouldn't there be unregister function corresponding to above register
> function?
>
Even though it is not difficult to implement, what situation will make
sense to unregister rather than enable_xxxx_scan GUC parameter added by
extension itself?
I initially thought prepared statement with custom-scan node is problematic
if provider got unregistered / unloaded, however, internal_unload_library()
actually does nothing. So, it is at least harmless even if we implemented.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

pgsql-v9.5-get_variable-smallfix.patch

Re: [v9.5] Custom Plan API

From

Amit Kapila

Date:

10 November 2014, 11:56:12

On Mon, Nov 10, 2014 at 4:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >
> > Few observations/questions related to this commit:
> >
> > 1.
> > @@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel,
> > deparse_context *context)
> > colinfo = deparse_columns_fetch(var->varno, dpns);
> > attnum = var->varattno;
> > }
> > + else if (IS_SPECIAL_VARNO(var->varno) && IsA(dpns->planstate,
> > + CustomScanState) && (expr = GetSpecialCustomVar((CustomScanState *)
> > + dpns->planstate, var, &child_ps)) != NULL) { deparse_namespace
> > + save_dpns;
> > +
> > + if (child_ps)
> > + push_child_plan(dpns, child_ps, &save_dpns);
> > + /*
> > + * Force parentheses because our caller probably assumed a Var is a
> > + * simple expression.
> > + */
> > + if (!IsA(expr, Var))
> > + appendStringInfoChar(buf, '(');
> > + get_rule_expr((Node *) expr, context, true); if (!IsA(expr, Var))
> > + appendStringInfoChar(buf, ')');
> > +
> > + if (child_ps)
> > + pop_child_plan(dpns, &save_dpns);
> > + return NULL;
> > + }
> >
> > a. It seems Assert for netlelvelsup is missing in this loop.
> >
> Indeed, this if-block does not have assertion unlike other special-varno.
>

Similar handling is required in function get_name_for_var_field().

Another point which I wanted to clarify is that in function

get_name_for_var_field(), for all other cases except the new

case added for CustomScanState, it calls get_name_for_var_field()

recursively to get the name of field whereas for CustomScanState,

it calls get_rule_expr() which doesn't look to be problematic in general,

but still it is better to get the name as other cases does unless there

is a special need for CustomScanState?

>
> > 2.
> > +void
> > +register_custom_path_provider(CustomPathMethods *cpp_methods)
> > {
> > ..
> > }
> >
> > Shouldn't there be unregister function corresponding to above register
> > function?
> >
> Even though it is not difficult to implement, what situation will make
> sense to unregister rather than enable_xxxx_scan GUC parameter added by
> extension itself?

I thought that in general if user has the API to register the custom path

methods, it should have some way to unregister them and also user might

need to register some different custom path methods after unregistering

the previous one's. I think we should see what Robert or others have to

say about this point before trying to provide such an API.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

10 November 2014, 13:03:24

On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I thought that in general if user has the API to register the custom path
> methods, it should have some way to unregister them and also user might
> need to register some different custom path methods after unregistering
> the previous one's.  I think we should see what Robert or others have to
> say about this point before trying to provide such an API.

I wouldn't bother.  As KaiGai says, if you want to shut the
functionality off, the provider itself can provide a GUC.  Also, we
really have made no effort to ensure that loadable modules can be
safely unloaded, or hooked functions safely-unhooked.
ExecutorRun_hook is a good example.  Typical of hook installation is
this:
       prev_ExecutorRun = ExecutorRun_hook;       ExecutorRun_hook = pgss_ExecutorRun;

Well, if multiple extensions use this hook, then there's no hope of
unloading them exception in reverse order of installation.  We
essentially end up creating a singly-linked list of hook users, but
with the next-pointers stored in arbitrarily-named, likely-static
variables owned by the individual extensions, so that nobody can
actually traverse it.  This might be worth fixing as part of a
concerted campaign to make UNLOAD work, but unless somebody's really
going to do that I see little reason to hold this to a higher standard
than we apply elsewhere.  The ability to remove extensions from this
hook won't be valuable by itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Amit Kapila

Date:

11 November 2014, 05:33:32

On Mon, Nov 10, 2014 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I thought that in general if user has the API to register the custom path
> > methods, it should have some way to unregister them and also user might
> > need to register some different custom path methods after unregistering
> > the previous one's. I think we should see what Robert or others have to
> > say about this point before trying to provide such an API.
>
> I wouldn't bother. As KaiGai says, if you want to shut the
> functionality off, the provider itself can provide a GUC. Also, we
> really have made no effort to ensure that loadable modules can be
> safely unloaded, or hooked functions safely-unhooked.
> ExecutorRun_hook is a good example. Typical of hook installation is
> this:
>
> prev_ExecutorRun = ExecutorRun_hook;
> ExecutorRun_hook = pgss_ExecutorRun;
>

In this case, Extension takes care of register and unregister for

hook. In _PG_init(), it register the hook and _PG_fini() it

unregisters the same. So if for custom scan core pg is

providing API to register the methods, shouldn't it provide an

API to unregister the same?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

11 November 2014, 13:31:13

On Tue, Nov 11, 2014 at 12:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Nov 10, 2014 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com>
>> wrote:
>> > I thought that in general if user has the API to register the custom
>> > path
>> > methods, it should have some way to unregister them and also user might
>> > need to register some different custom path methods after unregistering
>> > the previous one's.  I think we should see what Robert or others have to
>> > say about this point before trying to provide such an API.
>>
>> I wouldn't bother.  As KaiGai says, if you want to shut the
>> functionality off, the provider itself can provide a GUC.  Also, we
>> really have made no effort to ensure that loadable modules can be
>> safely unloaded, or hooked functions safely-unhooked.
>> ExecutorRun_hook is a good example.  Typical of hook installation is
>> this:
>>
>>         prev_ExecutorRun = ExecutorRun_hook;
>>         ExecutorRun_hook = pgss_ExecutorRun;
>>
>
> In this case, Extension takes care of register and unregister for
> hook.  In _PG_init(), it register the hook and _PG_fini() it
> unregisters the same.

The point is that there's nothing that you can do _PG_fini() that will
work correctly. If it does ExecutorRun_hook = prev_ExecutorRun, that's
fine if it's the most-recently-installed hook.  But if it isn't, then
doing so corrupts the list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

21 November 2014, 00:10:24

Robert Haas <robertmhaas@gmail.com> writes:
> I've committed parts 1 and 2 of this, without the documentation, and
> with some additional cleanup.  I am not sure that this feature is
> sufficiently non-experimental that it deserves to be documented, but
> if we're thinking of doing that then the documentation needs a lot
> more work.  I think part 3 of the patch is mostly useful as a
> demonstration of how this API can be used, and is not something we
> probably want to commit.  So I'm not planning, at this point, to spend
> any more time on this patch series, and will mark it Committed in the
> CF app.

I've done some preliminary cleanup on this patch, but I'm still pretty
desperately unhappy about some aspects of it, in particular the way that
it gets custom scan providers directly involved in setrefs.c,
finalize_primnode, and replace_nestloop_params processing.  I don't
want any of that stuff exported outside the core, as freezing those
APIs would be a very nasty restriction on future planner development.
What's more, it doesn't seem like doing that creates any value for
custom-scan providers, only a requirement for extra boilerplate code
for them to provide.

ISTM that we could avoid that by borrowing the design used for FDW
plans, namely that any expressions you would like planner post-processing
services for should be stuck into a predefined List field (fdw_exprs
for the ForeignScan case, perhaps custom_exprs for the CustomScan case?).
This would let us get rid of the SetCustomScanRef and FinalizeCustomScan
callbacks as well as simplify the API contract for PlanCustomPath.

I'm also wondering why this patch didn't follow the FDW lead in terms of
expecting private data to be linked from specialized "private" fields.
The design as it stands (with an expectation that CustomPaths, CustomPlans
etc would be larger than the core code knows about) is not awful, but it
seems just randomly different from the FDW precedent, and I don't see a
good argument why it should be.  If we undid that we could get rid of
CopyCustomScan callbacks, and perhaps also TextOutCustomPath and
TextOutCustomScan (though I concede there might be some argument to keep
the latter two anyway for debugging reasons).

Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism
added to ruleutils.c is anything but dead weight that we'll have to
maintain forever.  It seems somewhat unlikely that anyone will figure
out how to use it, or indeed that it can be used for anything very
interesting.  I suppose the argument for it is that you could stick
"custom vars" into the tlist of a CustomScan plan node, but you cannot,
at least not without a bunch of infrastructure that isn't there now;
in particular how would such an expression ever get matched by setrefs.c
to higher-level plan tlists?  I think we should rip that out and wait
to see a complete use-case before considering putting it back.

Comments?
        regards, tom lane

PS: with no documentation it's arguable that the entire patch is just
dead weight.  I'm not very happy that it went in without any.

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

21 November 2014, 01:31:09

On Thu, Nov 20, 2014 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I've done some preliminary cleanup on this patch, but I'm still pretty
> desperately unhappy about some aspects of it, in particular the way that
> it gets custom scan providers directly involved in setrefs.c,
> finalize_primnode, and replace_nestloop_params processing.  I don't
> want any of that stuff exported outside the core, as freezing those
> APIs would be a very nasty restriction on future planner development.
> What's more, it doesn't seem like doing that creates any value for
> custom-scan providers, only a requirement for extra boilerplate code
> for them to provide.
>
> ISTM that we could avoid that by borrowing the design used for FDW
> plans, namely that any expressions you would like planner post-processing
> services for should be stuck into a predefined List field (fdw_exprs
> for the ForeignScan case, perhaps custom_exprs for the CustomScan case?).
> This would let us get rid of the SetCustomScanRef and FinalizeCustomScan
> callbacks as well as simplify the API contract for PlanCustomPath.

Ah, that makes sense.  I'm not sure I really understand what's so bad
about the current system, but I have no issue with revising it for
consistency.

> I'm also wondering why this patch didn't follow the FDW lead in terms of
> expecting private data to be linked from specialized "private" fields.
> The design as it stands (with an expectation that CustomPaths, CustomPlans
> etc would be larger than the core code knows about) is not awful, but it
> seems just randomly different from the FDW precedent, and I don't see a
> good argument why it should be.  If we undid that we could get rid of
> CopyCustomScan callbacks, and perhaps also TextOutCustomPath and
> TextOutCustomScan (though I concede there might be some argument to keep
> the latter two anyway for debugging reasons).

OK.

> Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism
> added to ruleutils.c is anything but dead weight that we'll have to
> maintain forever.  It seems somewhat unlikely that anyone will figure
> out how to use it, or indeed that it can be used for anything very
> interesting.  I suppose the argument for it is that you could stick
> "custom vars" into the tlist of a CustomScan plan node, but you cannot,
> at least not without a bunch of infrastructure that isn't there now;
> in particular how would such an expression ever get matched by setrefs.c
> to higher-level plan tlists?  I think we should rip that out and wait
> to see a complete use-case before considering putting it back.

I thought this was driven by a suggestion from you, but maybe KaiGai
can comment.

> PS: with no documentation it's arguable that the entire patch is just
> dead weight.  I'm not very happy that it went in without any.

As I said, I wasn't sure we wanted to commit to the API enough to
document it, and by the time you get done whacking the stuff above
around, the documentation KaiGai wrote for it (which was also badly in
need of editing by a native English speaker) would have been mostly
obsolete anyway.  But I'm willing to put some effort into it once you
get done rearranging the furniture, if that's helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

21 November 2014, 01:52:24

> Robert Haas <robertmhaas@gmail.com> writes:
> > I've committed parts 1 and 2 of this, without the documentation, and
> > with some additional cleanup.  I am not sure that this feature is
> > sufficiently non-experimental that it deserves to be documented, but
> > if we're thinking of doing that then the documentation needs a lot
> > more work.  I think part 3 of the patch is mostly useful as a
> > demonstration of how this API can be used, and is not something we
> > probably want to commit.  So I'm not planning, at this point, to spend
> > any more time on this patch series, and will mark it Committed in the
> > CF app.
>
> I've done some preliminary cleanup on this patch, but I'm still pretty
> desperately unhappy about some aspects of it, in particular the way that
> it gets custom scan providers directly involved in setrefs.c,
> finalize_primnode, and replace_nestloop_params processing.  I don't want
> any of that stuff exported outside the core, as freezing those APIs would
> be a very nasty restriction on future planner development.
> What's more, it doesn't seem like doing that creates any value for
> custom-scan providers, only a requirement for extra boilerplate code for
> them to provide.
>
> ISTM that we could avoid that by borrowing the design used for FDW plans,
> namely that any expressions you would like planner post-processing services
> for should be stuck into a predefined List field (fdw_exprs for the
> ForeignScan case, perhaps custom_exprs for the CustomScan case?).
> This would let us get rid of the SetCustomScanRef and FinalizeCustomScan
> callbacks as well as simplify the API contract for PlanCustomPath.
>
If core backend can know which field contains expression nodes but
processed by custom-scan provider, FinalizedCustomScan might be able
to rid. However, rid of SetCustomScanRef makes unavailable a significant
use case I intend.
In case when tlist contains complicated expression node (thus it takes
many cpu cycles) and custom-scan provider has a capability to compute
this expression node externally, SetCustomScanRef hook allows to replace
this complicate expression node by a simple Var node that references
a value being externally computed.

Because only custom-scan provider can know how this "pseudo" varnode
is mapped to the original expression, it needs to call the hook to
assign correct varno/varattno. We expect it assigns a special vano,
like OUTER_VAR, and it is solved with GetSpecialCustomVar.

One other idea is, core backend has a facility to translate relationship
between the original expression and the pseudo varnode according to the
map information given by custom-scan provider.

> I'm also wondering why this patch didn't follow the FDW lead in terms of
> expecting private data to be linked from specialized "private" fields.
> The design as it stands (with an expectation that CustomPaths, CustomPlans
> etc would be larger than the core code knows about) is not awful, but it
> seems just randomly different from the FDW precedent, and I don't see a
> good argument why it should be.  If we undid that we could get rid of
> CopyCustomScan callbacks, and perhaps also TextOutCustomPath and
> TextOutCustomScan (though I concede there might be some argument to keep
> the latter two anyway for debugging reasons).
>
Yep, its original proposition last year followed the FDW manner. It has
custom_private field to store the private data of custom-scan provider,
however, I was suggested to change the current form because it added
a couple of routines to encode / decode Bitmapset that may lead other
encode / decode routines for other data types.

I'm neutral for this design choice. Either of them people accept is
better for me.

> Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism added
> to ruleutils.c is anything but dead weight that we'll have to maintain
> forever.  It seems somewhat unlikely that anyone will figure out how to
> use it, or indeed that it can be used for anything very interesting.  I
> suppose the argument for it is that you could stick "custom vars" into the
> tlist of a CustomScan plan node, but you cannot, at least not without a
> bunch of infrastructure that isn't there now; in particular how would such
> an expression ever get matched by setrefs.c to higher-level plan tlists?
> I think we should rip that out and wait to see a complete use-case before
> considering putting it back.
>
As I described above, as long as core backend has a facility to manage the
relationship between a pseudo varnode and complicated expression node, I
think we can rid this callback.

> PS: with no documentation it's arguable that the entire patch is just dead
> weight.  I'm not very happy that it went in without any.
>
I agree with this. Is it a good to write up a wikipage to brush up the
documentation draft?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

21 November 2014, 01:53:42

> On Thu, Nov 20, 2014 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I've done some preliminary cleanup on this patch, but I'm still pretty
> > desperately unhappy about some aspects of it, in particular the way
> > that it gets custom scan providers directly involved in setrefs.c,
> > finalize_primnode, and replace_nestloop_params processing.  I don't
> > want any of that stuff exported outside the core, as freezing those
> > APIs would be a very nasty restriction on future planner development.
> > What's more, it doesn't seem like doing that creates any value for
> > custom-scan providers, only a requirement for extra boilerplate code
> > for them to provide.
> >
> > ISTM that we could avoid that by borrowing the design used for FDW
> > plans, namely that any expressions you would like planner
> > post-processing services for should be stuck into a predefined List
> > field (fdw_exprs for the ForeignScan case, perhaps custom_exprs for the
> CustomScan case?).
> > This would let us get rid of the SetCustomScanRef and
> > FinalizeCustomScan callbacks as well as simplify the API contract for
> PlanCustomPath.
> 
> Ah, that makes sense.  I'm not sure I really understand what's so bad about
> the current system, but I have no issue with revising it for consistency.
> 
> > I'm also wondering why this patch didn't follow the FDW lead in terms
> > of expecting private data to be linked from specialized "private" fields.
> > The design as it stands (with an expectation that CustomPaths,
> > CustomPlans etc would be larger than the core code knows about) is not
> > awful, but it seems just randomly different from the FDW precedent,
> > and I don't see a good argument why it should be.  If we undid that we
> > could get rid of CopyCustomScan callbacks, and perhaps also
> > TextOutCustomPath and TextOutCustomScan (though I concede there might
> > be some argument to keep the latter two anyway for debugging reasons).
> 
> OK.
> 
So, the existing form shall be revised as follows?

* CustomScan shall not be a base type of custom data-type managed by extension, instead of private data field.
* It also eliminates CopyCustomScan and TextOutCustomPath/Scan callback.
* Expression nodes that will not be processed by core backend, but processed by extension shall be connected to special
field,like fdw_exprs in FDW.
 
* Translation between a pseudo varnode and original expression node shall be informed to the core backend, instead of
SetCustomScanRefand GetSpecialCustomVar.
 

> > Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism
> > added to ruleutils.c is anything but dead weight that we'll have to
> > maintain forever.  It seems somewhat unlikely that anyone will figure
> > out how to use it, or indeed that it can be used for anything very
> > interesting.  I suppose the argument for it is that you could stick
> > "custom vars" into the tlist of a CustomScan plan node, but you
> > cannot, at least not without a bunch of infrastructure that isn't
> > there now; in particular how would such an expression ever get matched
> > by setrefs.c to higher-level plan tlists?  I think we should rip that
> > out and wait to see a complete use-case before considering putting it
> back.
> 
> I thought this was driven by a suggestion from you, but maybe KaiGai can
> comment.
> 
> > PS: with no documentation it's arguable that the entire patch is just
> > dead weight.  I'm not very happy that it went in without any.
> 
> As I said, I wasn't sure we wanted to commit to the API enough to document
> it, and by the time you get done whacking the stuff above around, the
> documentation KaiGai wrote for it (which was also badly in need of editing
> by a native English speaker) would have been mostly obsolete anyway.  But
> I'm willing to put some effort into it once you get done rearranging the
> furniture, if that's helpful.
>
For people's convenient, I'd like to set up a wikipage to write up a draft
of SGML documentation for easy updates by native English speakers.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

21 November 2014, 16:08:23

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
>> I've done some preliminary cleanup on this patch, but I'm still pretty
>> desperately unhappy about some aspects of it, in particular the way that
>> it gets custom scan providers directly involved in setrefs.c,
>> finalize_primnode, and replace_nestloop_params processing.  I don't want
>> any of that stuff exported outside the core, as freezing those APIs would
>> be a very nasty restriction on future planner development.

> If core backend can know which field contains expression nodes but
> processed by custom-scan provider, FinalizedCustomScan might be able
> to rid. However, rid of SetCustomScanRef makes unavailable a significant
> use case I intend.
> In case when tlist contains complicated expression node (thus it takes
> many cpu cycles) and custom-scan provider has a capability to compute
> this expression node externally, SetCustomScanRef hook allows to replace
> this complicate expression node by a simple Var node that references
> a value being externally computed.

That's a fine goal to have, but this is not a solution that works for
any except trivial cases.  The problem is that that complicated expression
isn't going to be in the CustomScan's tlist in the first place unless you
have a one-node plan.  As soon as you have a join, for example, the
planner is going to delay calculation of anything more complex than a
plain Var to above the join.  Aggregation, GROUP BY, etc would also defeat
such an optimization.

This gets back to the remarks I made earlier about it not being possible
to do anything very interesting in a plugin of this nature.  You really
need cooperation from other places in the planner if you want to do things
like pushing down calculations into an external provider.  And it's not
at all clear how that would even work, let alone how we might make it
implementable as a plugin rather than core code.

Also, even if we could think of a way to do this from a CustomScan plugin,
that would fail to cover some very significant use-cases for pushing
down complex expressions, for example:
* retrieving values of expensive functions from expression indexes;
* pushing down expensive functions into FDWs so they can be done remotely.
And I'm also worried that once we've exported and thereby frozen the APIs
in this area, we'd be operating with one hand tied behind our backs in
solving those use-cases.  So I'm not very excited about pursuing the
problem in this form.

So I remain of the opinion that we should get the CustomScan stuff out
of setrefs processing, and also that having EXPLAIN support for such
special variables is premature.  It's possible that after the dust
settles we'd wind up with additions to ruleutils that look exactly like
what's in this patch ... but I'd bet against that.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

21 November 2014, 17:23:57

Robert Haas <robertmhaas@gmail.com> writes:
> As I said, I wasn't sure we wanted to commit to the API enough to
> document it, and by the time you get done whacking the stuff above
> around, the documentation KaiGai wrote for it (which was also badly in
> need of editing by a native English speaker) would have been mostly
> obsolete anyway.  But I'm willing to put some effort into it once you
> get done rearranging the furniture, if that's helpful.

I thought of another API change we should consider.  It's weird that
CustomPathMethods includes CreateCustomScanPath, because that's not
a method you apply to a CustomPath, it's what creates them in the first
place.  I'm inclined to think that we should get rid of that and
register_custom_path_provider() altogether and just provide a function
hook variable equivalent to create_customscan_paths, which providers can
link into in the usual way.  The register_custom_path_provider mechanism
might have some use if we were also going to provide deregister-by-name
functionality, but as you pointed out upthread, that's not likely to ever
be worth doing.

The hook function might better be named something like
editorialize_on_relation_paths, since in principle it could screw around
with the Paths already made by the core code, not just add CustomPaths.
There's an analogy to get_relation_info_hook, which is meant to let
plugins editorialize on the relation's index list.  So maybe
set_plain_rel_pathlist_hook?
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

21 November 2014, 22:39:59

> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> >> I've done some preliminary cleanup on this patch, but I'm still
> >> pretty desperately unhappy about some aspects of it, in particular
> >> the way that it gets custom scan providers directly involved in
> >> setrefs.c, finalize_primnode, and replace_nestloop_params processing.
> >> I don't want any of that stuff exported outside the core, as freezing
> >> those APIs would be a very nasty restriction on future planner
> development.
>
> > If core backend can know which field contains expression nodes but
> > processed by custom-scan provider, FinalizedCustomScan might be able
> > to rid. However, rid of SetCustomScanRef makes unavailable a
> > significant use case I intend.
> > In case when tlist contains complicated expression node (thus it takes
> > many cpu cycles) and custom-scan provider has a capability to compute
> > this expression node externally, SetCustomScanRef hook allows to
> > replace this complicate expression node by a simple Var node that
> > references a value being externally computed.
>
> That's a fine goal to have, but this is not a solution that works for any
> except trivial cases.  The problem is that that complicated expression
> isn't going to be in the CustomScan's tlist in the first place unless you
> have a one-node plan.  As soon as you have a join, for example, the planner
> is going to delay calculation of anything more complex than a plain Var
> to above the join.  Aggregation, GROUP BY, etc would also defeat such an
> optimization.
>
> This gets back to the remarks I made earlier about it not being possible
> to do anything very interesting in a plugin of this nature.  You really
> need cooperation from other places in the planner if you want to do things
> like pushing down calculations into an external provider.  And it's not
> at all clear how that would even work, let alone how we might make it
> implementable as a plugin rather than core code.
>
> Also, even if we could think of a way to do this from a CustomScan plugin,
> that would fail to cover some very significant use-cases for pushing down
> complex expressions, for example:
> * retrieving values of expensive functions from expression indexes;
> * pushing down expensive functions into FDWs so they can be done remotely.
> And I'm also worried that once we've exported and thereby frozen the APIs
> in this area, we'd be operating with one hand tied behind our backs in solving
> those use-cases.  So I'm not very excited about pursuing the problem in
> this form.
>
I count understand your concern; only available on a one-node plan and
may needs additional interaction between core and extension to push-
down complicated expression.
So, right now, I have to admit to rid of this hook for this purpose.

On the other hand, I thought to use similar functionality, but not
same, to implement join-replacement by custom-scan. I'd like to see
your comment prior to patch submission.

Let assume a custom-scan provider that runs on a materialized-view
(or, something like a query cache in memory) instead of join.
In this case, a reasonable design is to fetch a tuple from the
materialized-view then put it on the ecxt_scantuple of ExprContext
prior to evaluation of qualifier or tlist, unlike usual join takes
two slots - ecxt_innertuple and ecxt_outertuple.
Also, it leads individual varnode has to reference exct_scantuple,
neither ecxt_innertuple nor ecxt_outertuple.
The tuple in exct_scantuple contains attributes come from both
relations, thus, it needs to keep relationship a varattno of the
scanned tuple and its source relation where does it come from.

I intended to use the SetCustomScanRef callback to adjust varno
and varattno of the varnode that references the custom-scan node;
as if set_join_references() doing.
It does not mean a replacement of general expression by varnode,
just re-mapping of varno/varattno.

> So I remain of the opinion that we should get the CustomScan stuff out of
> setrefs processing, and also that having EXPLAIN support for such special
> variables is premature.  It's possible that after the dust settles we'd
> wind up with additions to ruleutils that look exactly like what's in this
> patch ... but I'd bet against that.
>
So, I can agree with rid of SetCustomScanRef and GetSpecialCustomVar.
However, some alternative functionality to implement the varno/varattno
remapping is needed soon.
How about your thought?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

21 November 2014, 23:14:20

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> Let assume a custom-scan provider that runs on a materialized-view
> (or, something like a query cache in memory) instead of join.
> In this case, a reasonable design is to fetch a tuple from the
> materialized-view then put it on the ecxt_scantuple of ExprContext
> prior to evaluation of qualifier or tlist, unlike usual join takes
> two slots - ecxt_innertuple and ecxt_outertuple.
> Also, it leads individual varnode has to reference exct_scantuple,
> neither ecxt_innertuple nor ecxt_outertuple.

OK, that's possibly a reasonable way to do it at runtime.  You don't
*have* to do it that way of course.  It would be only marginally
less efficient to reconstruct two tuples that match the shapes of the
original join inputs.

> I intended to use the SetCustomScanRef callback to adjust varno
> and varattno of the varnode that references the custom-scan node;
> as if set_join_references() doing.

I think this is really fundamentally misguided.  setrefs.c has no
business doing anything "interesting" like making semantically important
substitutions; those decisions need to be made much earlier.  An example
in the context of your previous proposal is that getting rid of expensive
functions without any adjustment of cost estimates is just wrong; and
I don't mean that you forgot to have your setrefs.c hook hack up the
Plan's cost fields.  The cost estimates need to change at the Path stage,
or the planner might not even select the right path at all.

I'm not sure where would be an appropriate place to deal with the kind of
thing you're thinking about here.  But I'm really not happy with the
concept of exposing the guts of setrefs.c in order to enable
single-purpose kluges like this.  We have fairly general problems to solve
in this area, and we should be working on solving them, not on freezing
relevant planner APIs to support marginally-useful external plugins.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

21 November 2014, 23:55:27

> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > Let assume a custom-scan provider that runs on a materialized-view
> > (or, something like a query cache in memory) instead of join.
> > In this case, a reasonable design is to fetch a tuple from the
> > materialized-view then put it on the ecxt_scantuple of ExprContext
> > prior to evaluation of qualifier or tlist, unlike usual join takes two
> > slots - ecxt_innertuple and ecxt_outertuple.
> > Also, it leads individual varnode has to reference exct_scantuple,
> > neither ecxt_innertuple nor ecxt_outertuple.
>
> OK, that's possibly a reasonable way to do it at runtime.  You don't
> *have* to do it that way of course.  It would be only marginally less
> efficient to reconstruct two tuples that match the shapes of the original
> join inputs.
>
> > I intended to use the SetCustomScanRef callback to adjust varno and
> > varattno of the varnode that references the custom-scan node; as if
> > set_join_references() doing.
>
> I think this is really fundamentally misguided.  setrefs.c has no business
> doing anything "interesting" like making semantically important
> substitutions; those decisions need to be made much earlier.  An example
> in the context of your previous proposal is that getting rid of expensive
> functions without any adjustment of cost estimates is just wrong; and I
> don't mean that you forgot to have your setrefs.c hook hack up the Plan's
> cost fields.  The cost estimates need to change at the Path stage, or the
> planner might not even select the right path at all.
>
Because we right now have no functionality to register custom-scan path
instead of join, I had to show another use scenario...

> I'm not sure where would be an appropriate place to deal with the kind of
> thing you're thinking about here.  But I'm really not happy with the concept
> of exposing the guts of setrefs.c in order to enable single-purpose kluges
> like this.  We have fairly general problems to solve in this area, and we
> should be working on solving them, not on freezing relevant planner APIs
> to support marginally-useful external plugins.
>
From my standpoint, varnode remapping on relations join is higher priority
than complicated expression node. As long as the core backend handles this
job, yes, I think a hook in setrefs.c is not mandatory.
Also, it means the role to solve special vernode on EXPLAIN is moved from
extension to the code, GetSpecialCustomVar can be rid.

Let me explain the current idea of mine.
CustomScan node will have a field that hold varnode mapping information
that is constructed by custom-scan provider on create_customscan_plan,
if they want. It is probably a list of varnode.
If exists, setrefs.c changes its behavior; that updates varno/varattno of
varnode according to this mapping, as if set_join_references() does
based on indexed_tlist.
To reference exct_scantuple, INDEX_VAR will be a best choice for varno
of these varnodes, and index of the above varnode mapping list will
be varattno. It can be utilized to make EXPLAIN output, instead of
GetSpecialCustomVar hook.

So, steps to go may be:
(1) Add custom_private, custom_exprs, ... instead of self defined data   type based on CustomXXX.
(2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current   custom-"scan" support.
(3) Integration of above varnode mapping feature within upcoming join   replacement by custom-scan support.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Tom Lane

Date:

22 November 2014, 18:43:03

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> Let me explain the current idea of mine.
> CustomScan node will have a field that hold varnode mapping information
> that is constructed by custom-scan provider on create_customscan_plan,
> if they want. It is probably a list of varnode.
> If exists, setrefs.c changes its behavior; that updates varno/varattno of
> varnode according to this mapping, as if set_join_references() does
> based on indexed_tlist.
> To reference exct_scantuple, INDEX_VAR will be a best choice for varno
> of these varnodes, and index of the above varnode mapping list will
> be varattno. It can be utilized to make EXPLAIN output, instead of
> GetSpecialCustomVar hook.

> So, steps to go may be:
> (1) Add custom_private, custom_exprs, ... instead of self defined data
>     type based on CustomXXX.
> (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current
>     custom-"scan" support.
> (3) Integration of above varnode mapping feature within upcoming join
>     replacement by custom-scan support.

Well ... I still do not find this interesting, because I don't believe
that CustomScan is a solution to anything interesting.  It's difficult
enough to solve problems like expensive-function pushdown within the
core code; why would we tie one hand behind our backs by insisting that
they should be solved by extensions?  And as I mentioned before, we do
need solutions to these problems in the core, regardless of CustomScan.

I think that a useful way to go at this might be to think first about
how to make use of expensive functions that have been cached in indexes,
and then see how the solution to that might translate to pushing down
expensive functions into FDWs and CustomScans.  If you start with the
CustomScan aspect of it then you immediately find yourself trying to
design APIs to divide up the solution, which is premature when you
don't even know what the solution is.

The rough idea I'd had about this is that while canvassing a relation's
indexes (in get_relation_info), we could create a list of precomputed
expressions that are available from indexes, then run through the
query tree and replace any matching subexpressions with some Var-like
nodes (or maybe better PlaceHolderVar-like nodes) that indicate that
"we can get this expression for free if we read the right index".
If we do read the right index, such an expression reduces to a Var in
the finished plan tree; if not, it reverts to the original expression.
(Some thought would need to be given to the semantics when the index's
table is underneath an outer join --- that may just mean that we can't
necessarily replace every textually-matching subexpression, only those
that are not above an outer join.)  One question mark here is how to do
the "replace any matching subexpressions" bit without O(lots) processing
cost in big queries.  But that's probably just a SMOP.  The bigger issue
I fear is that the planner is not currently structured to think that
evaluation cost of expressions in the SELECT list has anything to do
with which Path it should pick.  That is tied to the handwaving I've
been doing for awhile now about converting all the upper-level planning
logic into generate-and-compare-Paths style; we certainly cannot ignore
tlist eval costs while making those decisions.  So at least for those
upper-level Paths, we'd have to have a notion of what tlist we expect
that plan level to compute, and charge appropriate evaluation costs.

So there's a lot of work there and I don't find that CustomScan looks
like a solution to any of it.  CustomScan and FDWs could benefit from
this work, in that we'd now have a way to deal with the concept that
expensive functions (and aggregates, I hope) might be computed at
the bottom scan level.  But it's folly to suppose that we can make it
work just by hacking some arms-length extension code without any
fundamental planner changes.
        regards, tom lane

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

24 November 2014, 12:01:47

> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > Let me explain the current idea of mine.
> > CustomScan node will have a field that hold varnode mapping
> > information that is constructed by custom-scan provider on
> > create_customscan_plan, if they want. It is probably a list of varnode.
> > If exists, setrefs.c changes its behavior; that updates varno/varattno
> > of varnode according to this mapping, as if set_join_references() does
> > based on indexed_tlist.
> > To reference exct_scantuple, INDEX_VAR will be a best choice for varno
> > of these varnodes, and index of the above varnode mapping list will be
> > varattno. It can be utilized to make EXPLAIN output, instead of
> > GetSpecialCustomVar hook.
>
> > So, steps to go may be:
> > (1) Add custom_private, custom_exprs, ... instead of self defined data
> >     type based on CustomXXX.
> > (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current
> >     custom-"scan" support.
> > (3) Integration of above varnode mapping feature within upcoming join
> >     replacement by custom-scan support.
>
> Well ... I still do not find this interesting, because I don't believe that
> CustomScan is a solution to anything interesting.  It's difficult enough
> to solve problems like expensive-function pushdown within the core code;
> why would we tie one hand behind our backs by insisting that they should
> be solved by extensions?  And as I mentioned before, we do need solutions
> to these problems in the core, regardless of CustomScan.
>
I'd like to split the "anything interesting" into two portions.
As you pointed out, the feature to push-down complicated expression
may need a bit large efforts (for remaining two commit-fest at least),
however, what the feature to replace join by custom-scan requires is
similar to job of set_join_references() because it never involves
translation between varnode and general expression.

Also, from my standpoint, a simple join replacement by custom-scan has
higher priority; join acceleration in v9.5 makes sense even if full-
functionality of pushing down general expression is not supported yet.

> I think that a useful way to go at this might be to think first about how
> to make use of expensive functions that have been cached in indexes, and
> then see how the solution to that might translate to pushing down expensive
> functions into FDWs and CustomScans.  If you start with the CustomScan
> aspect of it then you immediately find yourself trying to design APIs to
> divide up the solution, which is premature when you don't even know what
> the solution is.
>
Yep, it also seems to me remaining two commit fests are a bit tight
schedule to make consensus of overall design and to implement.
I'd like to focus on the simpler portion first.

> The rough idea I'd had about this is that while canvassing a relation's
> indexes (in get_relation_info), we could create a list of precomputed
> expressions that are available from indexes, then run through the query
> tree and replace any matching subexpressions with some Var-like nodes (or
> maybe better PlaceHolderVar-like nodes) that indicate that "we can get this
> expression for free if we read the right index".
> If we do read the right index, such an expression reduces to a Var in the
> finished plan tree; if not, it reverts to the original expression.
> (Some thought would need to be given to the semantics when the index's table
> is underneath an outer join --- that may just mean that we can't necessarily
> replace every textually-matching subexpression, only those that are not
> above an outer join.) One question mark here is how to do the "replace
> any matching subexpressions" bit without O(lots) processing cost in big
> queries.  But that's probably just a SMOP.  The bigger issue I fear is that
> the planner is not currently structured to think that evaluation cost of
> expressions in the SELECT list has anything to do with which Path it should
> pick.  That is tied to the handwaving I've been doing for awhile now about
> converting all the upper-level planning logic into
> generate-and-compare-Paths style; we certainly cannot ignore tlist eval
> costs while making those decisions.  So at least for those upper-level Paths,
> we'd have to have a notion of what tlist we expect that plan level to compute,
> and charge appropriate evaluation costs.
>
Let me investigate the planner code more prior to comment on...

> So there's a lot of work there and I don't find that CustomScan looks like
> a solution to any of it.  CustomScan and FDWs could benefit from this work,
> in that we'd now have a way to deal with the concept that expensive functions
> (and aggregates, I hope) might be computed at the bottom scan level.  But
> it's folly to suppose that we can make it work just by hacking some
> arms-length extension code without any fundamental planner changes.
>
Indeed, I don't think it is a good idea to start from this harder portion.
Let's focus on just varno/varattno remapping to replace join relation by
custom-scan, as an immediate target.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

24 November 2014, 14:16:37

On Mon, Nov 24, 2014 at 6:57 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Indeed, I don't think it is a good idea to start from this harder portion.
> Let's focus on just varno/varattno remapping to replace join relation by
> custom-scan, as an immediate target.

We still need something like this for FDWs, as well.  The potential
gains there are enormous.  Anything we do had better fit in nicely
with that, rather than looking like a separate hack.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

25 November 2014, 08:45:32

> On Mon, Nov 24, 2014 at 6:57 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > Indeed, I don't think it is a good idea to start from this harder portion.
> > Let's focus on just varno/varattno remapping to replace join relation
> > by custom-scan, as an immediate target.
> 
> We still need something like this for FDWs, as well.  The potential gains
> there are enormous.  Anything we do had better fit in nicely with that,
> rather than looking like a separate hack.
> 
Today, I had a talk with Hanada-san to clarify which can be a common portion
of them and how to implement it. Then, we concluded both of features can be
shared most of the infrastructure.
Let me put an introduction of join replacement by foreign-/custom-scan below.

Its overall design intends to inject foreign-/custom-scan node instead of
the built-in join logic (based on the estimated cost). From the viewpoint of
core backend, it looks like a sub-query scan that contains relations join
internally.

What we need to do is below:

(1) Add a hook add_paths_to_joinrel()
It gives extensions (including FDW drivers and custom-scan providers) chance
to add alternative paths towards a particular join of relations, using
ForeignScanPath or CustomScanPath, if it can run instead of the built-in ones.

(2) Informs the core backend varno/varattno mapping
One thing we need to pay attention is, foreign-/custom-scan node that performs
instead of the built-in join node must return mixture of values come from both
relations. In case when FDW driver fetch a remote record (also, fetch a record
computed by external computing resource), the most reasonable way is to store
it on ecxt_scantuple of ExprContext, then kicks projection with varnode that
references this slot.
It needs an infrastructure that tracks relationship between original varnode
and the alternative varno/varattno. We thought, it shall be mapped to INDEX_VAR
and a virtual attribute number to reference ecxt_scantuple naturally, and
this infrastructure is quite helpful for both of ForegnScan/CustomScan.
We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan nodes.
It contains list of the original Var node that shall be mapped on the position
according to the list index. (e.g, the first varnode is varno=INDEX_VAR and
varattno=1)

(3) Reverse mapping on EXPLAIN
For EXPLAIN support, above varnode on the pseudo relation scan needed to be
solved. All we need to do is initialization of dpns->inner_tlist on
set_deparse_planstate() according to the above mapping.

(4) case of scanrelid == 0
To skip open/close (foreign) tables, we need to have a mark to introduce the
backend not to initialize the scan node according to table definition, but
according to the pseudo varnodes list.
As earlier custom-scan patch doing, scanrelid == 0 is a straightforward mark
to show the scan node is not combined with a particular real relation.
So, it also need to add special case handling around foreign-/custom-scan code.

We expect above changes are enough small to implement basic join push-down
functionality (that does not involves external computing of complicated
expression node), but valuable to support in v9.5.

Please comment on the proposition above.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

27 November 2014, 10:03:40

On 7 November 2014 at 22:46, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>> FYI, patch v12 part 2 no longer applies cleanly.
>>>
>> Thanks. I rebased the patch set according to the latest master branch.
>> The attached v13 can be applied to the master.
>
> I've committed parts 1 and 2 of this, without the documentation, and
> with some additional cleanup.  I am not sure that this feature is
> sufficiently non-experimental that it deserves to be documented, but
> if we're thinking of doing that then the documentation needs a lot
> more work.  I think part 3 of the patch is mostly useful as a
> demonstration of how this API can be used, and is not something we
> probably want to commit.  So I'm not planning, at this point, to spend
> any more time on this patch series, and will mark it Committed in the
> CF app.

I'm very concerned about the state of this feature. No docs, no
examples, and therefore, no testing. This standard of code is much
less than I've been taught is the minimum standard on this project.

There are zero docs, even in README. Experimental feature, or not,
there MUST be documentation somewhere, in some form, even if that is
just on the Wiki. Otherwise how it will ever be used sufficiently to
allow it to be declared fully usable?

The example contrib module was not committed and I am advised no longer works.

After much effort in persuading academic contacts to begin using the
feature for open source research it now appears pretty much unusable.

This is supposed to be an open project. Whoever takes responsibility
here, please ensure that those things are resolved, quickly.

We're on a time limit because any flaws in the API need to be ironed
out before its too late and we have to decide to either remove the API
because its flaky, or commit to supporting it in production for 9.5.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

27 November 2014, 10:34:34

> On 7 November 2014 at 22:46, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
> wrote:
> >>> FYI, patch v12 part 2 no longer applies cleanly.
> >>>
> >> Thanks. I rebased the patch set according to the latest master branch.
> >> The attached v13 can be applied to the master.
> >
> > I've committed parts 1 and 2 of this, without the documentation, and
> > with some additional cleanup.  I am not sure that this feature is
> > sufficiently non-experimental that it deserves to be documented, but
> > if we're thinking of doing that then the documentation needs a lot
> > more work.  I think part 3 of the patch is mostly useful as a
> > demonstration of how this API can be used, and is not something we
> > probably want to commit.  So I'm not planning, at this point, to spend
> > any more time on this patch series, and will mark it Committed in the
> > CF app.
> 
> 
> I'm very concerned about the state of this feature. No docs, no examples,
> and therefore, no testing. This standard of code is much less than I've
> been taught is the minimum standard on this project.
> 
> There are zero docs, even in README. Experimental feature, or not, there
> MUST be documentation somewhere, in some form, even if that is just on the
> Wiki. Otherwise how it will ever be used sufficiently to allow it to be
> declared fully usable?
> 
The reason why documentation portion was not yet committed is, sorry, it
is due to quality of documentation from the standpoint of native English
speaker.
Now, I'm writing up a documentation stuff according to the latest code base,
please wait for several days and help to improve.

> The example contrib module was not committed and I am advised no longer
> works.
> 
May I submit the contrib/ctidscan module again for an example?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

27 November 2014, 11:48:13

On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> The reason why documentation portion was not yet committed is, sorry, it
> is due to quality of documentation from the standpoint of native English
> speaker.
> Now, I'm writing up a documentation stuff according to the latest code base,
> please wait for several days and help to improve.

Happy to help with that.

Please post to the Wiki first so we can edit it communally.

>> The example contrib module was not committed and I am advised no longer
>> works.
>>
> May I submit the contrib/ctidscan module again for an example?

Yes please. We have other contrib modules that exist as tests, so this
seems reasonable to me.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

02 December 2014, 14:56:00

On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Today, I had a talk with Hanada-san to clarify which can be a common portion
> of them and how to implement it. Then, we concluded both of features can be
> shared most of the infrastructure.
> Let me put an introduction of join replacement by foreign-/custom-scan below.
>
> Its overall design intends to inject foreign-/custom-scan node instead of
> the built-in join logic (based on the estimated cost). From the viewpoint of
> core backend, it looks like a sub-query scan that contains relations join
> internally.
>
> What we need to do is below:
>
> (1) Add a hook add_paths_to_joinrel()
> It gives extensions (including FDW drivers and custom-scan providers) chance
> to add alternative paths towards a particular join of relations, using
> ForeignScanPath or CustomScanPath, if it can run instead of the built-in ones.
>
> (2) Informs the core backend varno/varattno mapping
> One thing we need to pay attention is, foreign-/custom-scan node that performs
> instead of the built-in join node must return mixture of values come from both
> relations. In case when FDW driver fetch a remote record (also, fetch a record
> computed by external computing resource), the most reasonable way is to store
> it on ecxt_scantuple of ExprContext, then kicks projection with varnode that
> references this slot.
> It needs an infrastructure that tracks relationship between original varnode
> and the alternative varno/varattno. We thought, it shall be mapped to INDEX_VAR
> and a virtual attribute number to reference ecxt_scantuple naturally, and
> this infrastructure is quite helpful for both of ForegnScan/CustomScan.
> We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan nodes.
> It contains list of the original Var node that shall be mapped on the position
> according to the list index. (e.g, the first varnode is varno=INDEX_VAR and
> varattno=1)
>
> (3) Reverse mapping on EXPLAIN
> For EXPLAIN support, above varnode on the pseudo relation scan needed to be
> solved. All we need to do is initialization of dpns->inner_tlist on
> set_deparse_planstate() according to the above mapping.
>
> (4) case of scanrelid == 0
> To skip open/close (foreign) tables, we need to have a mark to introduce the
> backend not to initialize the scan node according to table definition, but
> according to the pseudo varnodes list.
> As earlier custom-scan patch doing, scanrelid == 0 is a straightforward mark
> to show the scan node is not combined with a particular real relation.
> So, it also need to add special case handling around foreign-/custom-scan code.
>
> We expect above changes are enough small to implement basic join push-down
> functionality (that does not involves external computing of complicated
> expression node), but valuable to support in v9.5.
>
> Please comment on the proposition above.

I don't really have any technical comments on this design right at the
moment, but I think it's an important area where PostgreSQL needs to
make some progress sooner rather than later, so I hope that we can get
something committed in time for 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

03 December 2014, 00:43:05

> -----Original Message-----
> From: Simon Riggs [mailto:simon@2ndQuadrant.com]
> Sent: Thursday, November 27, 2014 8:48 PM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru
> Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter
> Eisentraut
> Subject: Re: [HACKERS] [v9.5] Custom Plan API
> 
> On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> 
> > The reason why documentation portion was not yet committed is, sorry,
> > it is due to quality of documentation from the standpoint of native
> > English speaker.
> > Now, I'm writing up a documentation stuff according to the latest code
> > base, please wait for several days and help to improve.
> 
> Happy to help with that.
> 
> Please post to the Wiki first so we can edit it communally.
> 
Simon, 

I tried to describe how custom-scan provider interact with the core backend,
and expectations to individual callbacks here.
 https://wiki.postgresql.org/wiki/CustomScanInterface

I'd like to see which kind of description should be added, from third person's
viewpoint.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

06 December 2014, 15:36:51

On 27 November 2014 at 20:48, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>
>> The reason why documentation portion was not yet committed is, sorry, it
>> is due to quality of documentation from the standpoint of native English
>> speaker.
>> Now, I'm writing up a documentation stuff according to the latest code base,
>> please wait for several days and help to improve.
>
> Happy to help with that.
>
> Please post to the Wiki first so we can edit it communally.

I've corrected a spelling mistake, but it reads OK at moment.


>>> The example contrib module was not committed and I am advised no longer
>>> works.
>>>
>> May I submit the contrib/ctidscan module again for an example?
>
> Yes please. We have other contrib modules that exist as tests, so this
> seems reasonable to me.

I can't improve the docs without the example code. Is that available now?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

06 December 2014, 23:37:26

Simon,

> > Yes please. We have other contrib modules that exist as tests, so this
> > seems reasonable to me.
> 
> I can't improve the docs without the example code. Is that available now?
>
Please wait for a few days. The ctidscan module is not adjusted for the
latest interface yet.

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Simon Riggs [mailto:simon@2ndQuadrant.com]
> Sent: Sunday, December 07, 2014 12:37 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru
> Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter
> Eisentraut
> Subject: Re: [HACKERS] [v9.5] Custom Plan API
> 
> On 27 November 2014 at 20:48, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >
> >> The reason why documentation portion was not yet committed is, sorry,
> >> it is due to quality of documentation from the standpoint of native
> >> English speaker.
> >> Now, I'm writing up a documentation stuff according to the latest
> >> code base, please wait for several days and help to improve.
> >
> > Happy to help with that.
> >
> > Please post to the Wiki first so we can edit it communally.
> 
> I've corrected a spelling mistake, but it reads OK at moment.
> 
> 
> >>> The example contrib module was not committed and I am advised no
> >>> longer works.
> >>>
> >> May I submit the contrib/ctidscan module again for an example?
> >
> > Yes please. We have other contrib modules that exist as tests, so this
> > seems reasonable to me.
> 
> I can't improve the docs without the example code. Is that available now?
> 
> --
>  Simon Riggs                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Jim Nasby

Date:

08 December 2014, 21:09:10

On 12/6/14, 5:21 PM, Kouhei Kaigai wrote:
>>> > >Yes please. We have other contrib modules that exist as tests, so this
>>> > >seems reasonable to me.
>> >
>> >I can't improve the docs without the example code. Is that available now?
>> >
> Please wait for a few days. The ctidscan module is not adjusted for the
> latest interface yet.

I've made some minor edits, with an emphasis on not changing original intent. Each section was saved as a separate
edit,so if anyone objects to something just revert the relevant change. Once the code is available more editing can be
done.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [v9.5] Custom Plan API

From

Simon Riggs

Date:

09 December 2014, 08:24:38

On 7 December 2014 at 08:21, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Please wait for a few days. The ctidscan module is not adjusted for the
> latest interface yet.

I am in many ways a patient man. At this point it is 12 days since my
request for a working example.

Feedback I am receiving is that the API is unusable. That could be
because it is impenetrable, or because it is unusable. I'm not sure it
matters which.

We need a working example to ensure that the API meets the needs of a
wide section of users and if it does not, to give other users a chance
to request changes to the API so that it becomes usable. The window
for such feedback is approaching zero very quickly now and we need
action.

Thanks

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Kouhei Kaigai

Date:

09 December 2014, 09:42:35

Simon,

The sample code is here: https://github.com/kaigai/ctidscan

The code itself and regression tests shows how does it work and
interact with the core backend.

However, its source code comments are not updated and SGML document
is not ready yet, because of my schedule in earlier half of December.
I try to add the above stuff for a patch of contrib module, but will
take a few more days.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Simon Riggs [mailto:simon@2ndQuadrant.com]
> Sent: Tuesday, December 09, 2014 12:24 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru
> Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter
> Eisentraut
> Subject: Re: [HACKERS] [v9.5] Custom Plan API
> 
> On 7 December 2014 at 08:21, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> 
> > Please wait for a few days. The ctidscan module is not adjusted for
> > the latest interface yet.
> 
> I am in many ways a patient man. At this point it is 12 days since my request
> for a working example.
> 
> Feedback I am receiving is that the API is unusable. That could be because
> it is impenetrable, or because it is unusable. I'm not sure it matters which.
> 
> We need a working example to ensure that the API meets the needs of a wide
> section of users and if it does not, to give other users a chance to request
> changes to the API so that it becomes usable. The window for such feedback
> is approaching zero very quickly now and we need action.
> 
> Thanks
> 
> --
>  Simon Riggs                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

Re: [v9.5] Custom Plan API

From

Robert Haas

Date:

11 December 2014, 02:58:22

On Tue, Dec 9, 2014 at 3:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Feedback I am receiving is that the API is unusable. That could be
> because it is impenetrable, or because it is unusable. I'm not sure it
> matters which.

It would be nice to here what someone is trying to use it for and what
problems that person is encountering.  Without that, it's pretty much
impossible for anyone to fix anything.

As for sample code, KaiGai had a working example, which of course got
broken when Tom changed the API, but it didn't look to me like Tom's
changes would have made anything impossible that was possible before.
I'm frankly kind of astonished by the tenor of this entire
conversation; there is certainly plenty of code in the backend that is
less self-documenting than this is; and KaiGai did already put up a
wiki page with documentation as you requested.  From his response, it
sounds like he has updated the ctidscan code, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company