Thread: [v9.5] Custom Plan API
Prior to the development cycle towards v9.5, I'd like to reopen the discussion of custom-plan interface. Even though we had lots of discussion during the last three commit-fests, several issues are still under discussion. So, I'd like to clarify direction of the implementation, prior to the first commit-fest. (1) DDL support and system catalog Simon suggested that DDL command should be supported to track custom- plan providers being installed, and to avoid nonsense hook calls if it is an obvious case that custom-plan provider can help. It also makes sense to give a chance to load extensions once installed. (In the previous design, I assumed modules are loaded by LOAD command or *_preload_libraries parameters). I tried to implement the following syntax: CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>; It records a particular function as an entrypoint of custom-plan provider, then it will be called when planner tries to find out the best path to scan or join relations. This function takes an argument (INTERNAL type) that packs information to construct and register an alternative scan/join path, like PlannerInfo, RelOptInfo and so on. (*) The data structure below will be supplied, in case of scan path. typedef struct { uint32 custom_class; PlannerInfo *root; RelOptInfo *baserel; RangeTblEntry *rte; } customScanArg; This function, usually implemented with C-language, can construct a custom object being delivered from CustomPath type that contains a set of function pointers; including functions that populate another objects delivered from CustomPlan or CustomPlanState as I did in the patch towards v9.4 development. Properties of individual custom-plan providers are recorded in the pg_custom_plan system catalog. Right now, its definition is quite simple - only superuser can create / drop custom-plan providers, and its definition does not belong to a particular namespace. Because of this assumption (only superuser can touch), I don't put database ACL mechanism here. What kind of characteristics should be there? (2) Static functions to be exported Tom concerned that custom-plan API needs several key functions can be called by extensions, although these are declared as static functions, thus, it looks like a part of interfaces. Once people thought it is stable ones we can use beyond the version up, it may become a barrier to the future improvement in the core code. Is it a right understanding, isn't it? One solution is to write a notice clearly, like: "these external functions are not stable interfaces, so extension should not assumed these functions are available beyond future version up". Nevertheless, more stable functions are more kindness for authors of extensions. So, I tried a few approaches. First of all, we categorized functions into three categories. (A) It walks on plan/expression tree recursively. (B) It modifies internal state of the core backend. (C) It is commonly used but in a particular source file. Although the number of functions are not so many, (A) and (B) must have its entrypoint from extensions. If unavailable, extension needs to manage a copied code with small enhancement by itself, and its burden is similar to just branching the tree. Example of (A) are: create_plan_recurse, set_plan_refs, ... Example of (B) are: fix_expr_common, ... On the other hands, (C) functions are helpful if available, however, it is not mandatory requirement to implement. Our first trial, according to the proposition by Tom, is to investigate a common walker function on plan tree as we are now doing on expression tree. We expected, we can give function pointers of key routines to extensions, instead of exporting the static functions. However, it didn't work well because existing recursive call takes various kind of jobs for each plan-node type, so it didn't fit a structure of walker functions; that applies a uniform operation for each node. Note that, I assumed the following walker functions that applies plan_walker or expr_walker on the underlying plan/expression trees. bool plan_tree_walker(Plan *plan, bool (*plan_walker) (), bool (*expr_walker) (), void *context) Please tell me if it is different from your ideas, I'll reconsider it. On the next, I tried another approach that gives function pointers of (A) and (B) functions as a part of custom-plan interface. It is workable at least, however, it seems to me its interface definition has advantage in comparison to the original approach. For example, below is definition of the callback in setref.c. + void (*SetCustomPlanRef)(PlannerInfo *root, + CustomPlan *custom_plan, + int rtoffset, + Plan *(*fn_set_plan_refs)(PlannerInfo *root, + Plan *plan, + int rtoffset), + void (*fn_fix_expr_common)(PlannerInfo *root, + Node *node)); Extension needs set_plan_refs() and fix_expr_common() at least, I added function pointers of them. But this definition has to be updated according to the future update of these functions. It does not seem to me a proper way to smooth the impact of future internal change. So, I'd like to find out where is a good common ground to solve the matter. One idea is the first simple solution. The core PostgreSQL will be developed independently from the out-of-tree modules, so we don't care about stability of declaration of internal functions, even if it is exported to multiple source files. (I believe it is our usual manner.) One other idea is, a refactoring of the core backend to consolidate routines per plan-node, not processing stage. For example, createplan.c contains most of codes commonly needed to create plan, in addition to individual plan node. Let's assume a function like create_seqscan_plan() are located in a separated source file, then routines to be exported become clear. One expected disadvantage is, this refactoring makes complicated to back patches. Do you have any other ideas to implement it well? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kohei KaiGai [mailto:kaigai@kaigai.gr.jp] > Sent: Tuesday, April 29, 2014 10:07 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Tom Lane; Andres Freund; Robert Haas; Simon Riggs; PgHacker; Stephen > Frost; Shigeru Hanada; Jim Mlodgenski; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > >> Yeah. I'm still not exactly convinced that custom-scan will ever > >> allow independent development of new plan types (which, with all due > >> respect to Robert, is what it was being sold as last year in Ottawa). > >> But I'm not opposed in principle to committing it, if we can find a > >> way to have a cleaner API for things like setrefs.c. It seems like > >> late-stage planner processing in general is an issue for this patch > >> (createplan.c and subselect.c are also looking messy). EXPLAIN isn't > too great either. > >> > >> I'm not sure exactly what to do about those cases, but I wonder > >> whether things would get better if we had the equivalent of > >> expression_tree_walker/mutator capability for plan nodes. The state > >> of affairs in setrefs and subselect, at least, is a bit reminiscent > >> of the bad old days when we had lots of different bespoke code for > >> traversing expression trees. > >> > > Hmm. If we have something like expression_tree_walker/mutator for plan > > nodes, we can pass a walker/mutator function's pointer instead of > > exposing static functions that takes recursive jobs. > > If custom-plan provider (that has sub-plans) got a callback with > > walker/ mutator pointer, all it has to do for sub-plans are calling > > this new plan-tree walking support routine with supplied walker/mutator. > > It seems to me more simple design than what I did. > > > I tried to code the similar walker/mutator functions on plan-node tree, > however, it was not available to implement these routines enough simple, > because the job of walker/mutator functions are not uniform thus caller > side also must have a large switch-case branches. > > I picked up setrefs.c for my investigation. > The set_plan_refs() applies fix_scan_list() on the expression tree being > appeared in the plan node if it is delivered from Scan, however, it also > applies set_join_references() for subclass of Join, or > set_dummy_tlist_references() for some other plan nodes. > It implies that the walker/mutator functions of Plan node has to apply > different operation according to the type of Plan node. I'm not certain > how much different forms are needed. > (In addition, set_plan_refs() performs usually like a walker, but often > performs as a mutator if trivial subquery....) > > I'm expecting the function like below. It allows to call plan_walker > function for each plan-node and also allows to call expr_walker function > for each expression-node on the plan node. > > bool > plan_tree_walker(Plan *plan, > bool (*plan_walker) (), > bool (*expr_walker) (), > void *context) > > I'd like to see if something other form to implement this routine. > > > One alternative idea to give custom-plan provider a chance to handle its > subplans is, to give function pointers (1) to handle recursion of plan-tree > and (2) to set up backend's internal state. > In case of setrefs.c, set_plan_refs() and fix_expr_common() are minimum > necessity for extensions. It also kills necessity to export static > functions. > > How about your thought? > -- > KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Prior to the development cycle towards v9.5, I'd like to reopen > the discussion of custom-plan interface. Even though we had lots > of discussion during the last three commit-fests, several issues > are still under discussion. So, I'd like to clarify direction of > the implementation, prior to the first commit-fest. > > (1) DDL support and system catalog > > Simon suggested that DDL command should be supported to track custom- > plan providers being installed, and to avoid nonsense hook calls > if it is an obvious case that custom-plan provider can help. It also > makes sense to give a chance to load extensions once installed. > (In the previous design, I assumed modules are loaded by LOAD command > or *_preload_libraries parameters). > > I tried to implement the following syntax: > > CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>; Thank you for exploring that thought and leading the way on this research. I've been thinking about this also. What I think we need is a declarative form that expresses the linkage between base table(s) and a related data structures that can be used to optimize a query, while still providing accurate results. In other DBMS, we have concepts such as a JoinIndex or a MatView which allow some kind of lookaside behaviour. Just for clarity, a concrete example is Oracle's Materialized Views which can be set using ENABLE QUERY REWRITE so that the MatView can be used as an alternative path for a query. We do already have this concept in PostgreSQL, where an index can be used to perform an IndexOnlyScan rather than accessing the heap itself. We have considerable evidence that the idea of alternate data structures results in performance gains. * KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom * http://www.postgresql.org/message-id/52C59858.9090500@garret.ru * http://citusdata.github.io/cstore_fdw/ * University of Manchester - exploring GPUs as part of the AXLE project * Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE project * Some other authors have also cited gains using GPU technology in databases So I would like to have a mechanism that provides a *generic* Lookaside for a table or foreign table. Tom and Kevin have previously expressed that MatViews would represent a planning problem, in the general case. One way to solve that planning issue is to link structures directly together, in the same way that an index and a table are linked. We can then process the lookaside in the same way we handle a partial index - check prerequisites and if usable, calculate a cost for the alternate path. We need not add planning time other than to the tables that might benefit from that. Roughly, I'm thinking of this... CREATE LOOKASIDE ON foo TO foo_mat_view; and also this... CREATE LOOKASIDE ON foo TO foo_as_a_foreign_table /* e.g. PGStrom */ This would allow the planner to consider alternate plans for foo_mv during set_plain_rel_pathlist() similarly to the way it considers index paths, in one of the common cases that the mat view covers just one table. This concept is similar to ENABLE QUERY REWRITE in Oracle, but this thought goes much further, to include any generic user-defined data structure or foreign table. Do we need this? For MVs, we *might* be able to deduce that the MV is rewritable for "foo", but that is not deducible for Foreign Tables, by current definition, so I prefer the explicit definition of objects that are linked - since doing this for indexes is already familiar to people. Having an explicit linkage between data structures allows us to enhance an existing application by transaparently adding new structures, just as we already do with indexes. Specifically, that we allow more than one lookaside structure on any one table. Forget the exact name, thats not important. But I think the requirements here are... * Explicit definition that we are attaching an alternate path onto a table (conceptually similar to adding an index) * Ability to check that the alternate path is viable (similar to the way we validate use of partial indexes prior to usage) Checks on columns(SELECT), rows(WHERE), aggregations(GROUP) * Ability to consider access cost for both normal table and alternate path (like an index) - this allows the alternate path to *not* be chosen when we are performing some operation that is sub-optimal (for whatever reason). * There may be some need to define operator classes that are implemented via the alternate path which works for single tables, but a later requirement would then be * allows the join of one or more tables to be replaced with a single lookaside Hopefully, we won't need a "Custom Plan" at all, just the ability to lookaside when useful. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Prior to the development cycle towards v9.5, I'd like to reopen the > > discussion of custom-plan interface. Even though we had lots of > > discussion during the last three commit-fests, several issues are > > still under discussion. So, I'd like to clarify direction of the > > implementation, prior to the first commit-fest. > > > > (1) DDL support and system catalog > > > > Simon suggested that DDL command should be supported to track custom- > > plan providers being installed, and to avoid nonsense hook calls if it > > is an obvious case that custom-plan provider can help. It also makes > > sense to give a chance to load extensions once installed. > > (In the previous design, I assumed modules are loaded by LOAD command > > or *_preload_libraries parameters). > > > > I tried to implement the following syntax: > > > > CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>; > > Thank you for exploring that thought and leading the way on this research. > I've been thinking about this also. > > What I think we need is a declarative form that expresses the linkage between > base table(s) and a related data structures that can be used to optimize > a query, while still providing accurate results. > > In other DBMS, we have concepts such as a JoinIndex or a MatView which allow > some kind of lookaside behaviour. Just for clarity, a concrete example is > Oracle's Materialized Views which can be set using ENABLE QUERY REWRITE > so that the MatView can be used as an alternative path for a query. We do > already have this concept in PostgreSQL, where an index can be used to > perform an IndexOnlyScan rather than accessing the heap itself. > > We have considerable evidence that the idea of alternate data structures > results in performance gains. > * KaiGai's work - https://wiki.postgresql.org/wiki/PGStrom > * http://www.postgresql.org/message-id/52C59858.9090500@garret.ru > * http://citusdata.github.io/cstore_fdw/ > * University of Manchester - exploring GPUs as part of the AXLE project > * Barcelona SuperComputer Centre - exploring FPGAs, as part of the AXLE > project > * Some other authors have also cited gains using GPU technology in databases > > So I would like to have a mechanism that provides a *generic* Lookaside > for a table or foreign table. > > Tom and Kevin have previously expressed that MatViews would represent a > planning problem, in the general case. One way to solve that planning issue > is to link structures directly together, in the same way that an index and > a table are linked. We can then process the lookaside in the same way we > handle a partial index - check prerequisites and if usable, calculate a > cost for the alternate path. > We need not add planning time other than to the tables that might benefit > from that. > > Roughly, I'm thinking of this... > > CREATE LOOKASIDE ON foo > TO foo_mat_view; > > and also this... > > CREATE LOOKASIDE ON foo > TO foo_as_a_foreign_table /* e.g. PGStrom */ > > This would allow the planner to consider alternate plans for foo_mv during > set_plain_rel_pathlist() similarly to the way it considers index paths, > in one of the common cases that the mat view covers just one table. > > This concept is similar to ENABLE QUERY REWRITE in Oracle, but this thought > goes much further, to include any generic user-defined data structure or > foreign table. > Let me clarify. This mechanism allows to add alternative scan/join paths including built-in ones, not only custom enhanced plan/exec node, isn't it? Probably, it is a variation of above proposition if we install a handler function that proposes built-in path nodes towards the request for scan/join. > Do we need this? For MVs, we *might* be able to deduce that the MV is > rewritable for "foo", but that is not deducible for Foreign Tables, by > current definition, so I prefer the explicit definition of objects that > are linked - since doing this for indexes is already familiar to people. > > Having an explicit linkage between data structures allows us to enhance > an existing application by transaparently adding new structures, just as > we already do with indexes. Specifically, that we allow more than one > lookaside structure on any one table. > Not only alternative data structure, alternative method to scan/join towards same data structure is also important, isn't it? > Forget the exact name, thats not important. But I think the requirements > here are... > > * Explicit definition that we are attaching an alternate path onto a table > (conceptually similar to adding an index) > I think the syntax allows "tables", not only a particular table. It will inform the core planner this lookaside/customplan (name is not important, anyway this feature...) can provide alternative path towards the set of relations; being considered. So, it allows to reduce number of function calls on planner stage. > * Ability to check that the alternate path is viable (similar to the way > we validate use of partial indexes prior to usage) > Checks on columns(SELECT), rows(WHERE), aggregations(GROUP) > I never deny it... but do you think this feature from the initial version?? > * Ability to consider access cost for both normal table and alternate path > (like an index) - this allows the alternate path to *not* be chosen when > we are performing some operation that is sub-optimal (for whatever reason). > It is an usual job of existing planner, isn't it? > * There may be some need to define operator classes that are implemented > via the alternate path > > which works for single tables, but a later requirement would then be > > * allows the join of one or more tables to be replaced with a single lookaside > It's higher priority for me, and I guess it is same in MatView usage. > Hopefully, we won't need a "Custom Plan" at all, just the ability to > lookaside when useful. > Probably, lookaside is a special case in the scenario that custom-plan can provide. I also think it is an attractive use case if we can redirect a particular complicated join into a MatView reference. So, it makes sense to bundle a handler function to replace join by matview reference. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 7 May 2014 08:17, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Let me clarify. This mechanism allows to add alternative scan/join paths > including built-in ones, not only custom enhanced plan/exec node, isn't it? > Probably, it is a variation of above proposition if we install a handler > function that proposes built-in path nodes towards the request for scan/join. Yes, I am looking for a way to give you the full extent of your requirements, within the Postgres framework. I have time and funding to assist you in achieving this in a general way that all may make use of. > Not only alternative data structure, alternative method to scan/join towards > same data structure is also important, isn't it? Agreed. My proposal is that if the planner allows the lookaside to an FDW then we pass the query for full execution on the FDW. That means that the scan, aggregate and join could take place via the FDW. i.e. "Custom Plan" == lookaside + FDW Or put another way, if we add Lookaside then we can just plug in the pgstrom FDW directly and we're done. And everybody else's FDW will work as well, so Citus etcc will not need to recode. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> -----Original Message----- > From: Simon Riggs [mailto:simon@2ndQuadrant.com] > Sent: Wednesday, May 07, 2014 5:02 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Tom Lane; Robert Haas; Andres Freund; PgHacker; Stephen Frost; Shigeru > Hanada; Jim Mlodgenski; Peter Eisentraut; Kohei KaiGai > Subject: Re: [v9.5] Custom Plan API > > On 7 May 2014 08:17, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > Let me clarify. This mechanism allows to add alternative scan/join > > paths including built-in ones, not only custom enhanced plan/exec node, > isn't it? > > Probably, it is a variation of above proposition if we install a > > handler function that proposes built-in path nodes towards the request > for scan/join. > > Yes, I am looking for a way to give you the full extent of your requirements, > within the Postgres framework. I have time and funding to assist you in > achieving this in a general way that all may make use of. > > > Not only alternative data structure, alternative method to scan/join > > towards same data structure is also important, isn't it? > > Agreed. My proposal is that if the planner allows the lookaside to an FDW > then we pass the query for full execution on the FDW. That means that the > scan, aggregate and join could take place via the FDW. i.e. > "Custom Plan" == lookaside + FDW > > Or put another way, if we add Lookaside then we can just plug in the pgstrom > FDW directly and we're done. And everybody else's FDW will work as well, > so Citus etcc will not need to recode. > Hmm. That sounds me, you intend to make FDW perform as a central facility to host pluggable plan/exec stuff. Even though we have several things to be clarified, I also think it's a direction worth to investigate. Let me list up the things to be clarified / developed randomly. * Join replacement by FDW; We still don't have consensus about join replacement by FDW. Probably, it will be designed toremote-join implementation primarily, however, things to do is similar. We may need to revisit the Hanada-san's propositionin the past. * Lookaside for ANY relations; I want planner to try GPU-scan for any relations once installed, to reduce user's administrationcost. It needs lookaside allow to specify a particular foreign-server, not foreign- table, then create ForeignScannode that is not associated with a particular foreign-table. * ForeignScan node that is not associated with a particular foreign-table. Once we try to apply ForeignScan node insteadof Sort or Aggregate, existing FDW implementation needs to be improved. These nodes scan on a materialized relation(generated on the fly), however, existing FDW code assumes ForeignScan node is always associated with a particularforeign-table. We need to eliminate this restriction. * FDW method for MultiExec. In case when we can stack multiple ForeignScan nodes, it's helpful to support to exchange scannedtuples in their own data format. Let's assume two ForeignScan nodes are stacked. One performs like Sort, another performslike Scan. If they internally handle column- oriented data format, TupleTableSlot is not a best way for data exchange. * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented using writable FDW feature. Not a big issue, butdon't forget it... How about your opinion? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 7 May 2014 10:06, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Let me list up the things to be clarified / developed randomly. > > * Join replacement by FDW; We still don't have consensus about join replacement > by FDW. Probably, it will be designed to remote-join implementation primarily, > however, things to do is similar. We may need to revisit the Hanada-san's > proposition in the past. Agreed. We need to push down joins into FDWs and we need to push down aggregates also, so they can be passed to FDWs. I'm planning to look at aggregate push down. > * Lookaside for ANY relations; I want planner to try GPU-scan for any relations > once installed, to reduce user's administration cost. > It needs lookaside allow to specify a particular foreign-server, not foreign- > table, then create ForeignScan node that is not associated with a particular > foreign-table. IMHO we would not want to add indexes to every column, on every table, nor would we wish to use lookaside for all tables. It is a good thing to be able to add optimizations for individual tables. GPUs are not good for everything; it is good to be able to leverage their strengths, yet avoid their weaknesses. If do you want that, you can write an Event Trigger that automatically adds a lookaside for any table. > * ForeignScan node that is not associated with a particular foreign-table. > Once we try to apply ForeignScan node instead of Sort or Aggregate, existing > FDW implementation needs to be improved. These nodes scan on a materialized > relation (generated on the fly), however, existing FDW code assumes > ForeignScan node is always associated with a particular foreign-table. > We need to eliminate this restriction. I don't think we need to do that, given the above. > * FDW method for MultiExec. In case when we can stack multiple ForeignScan > nodes, it's helpful to support to exchange scanned tuples in their own > data format. Let's assume two ForeignScan nodes are stacked. One performs > like Sort, another performs like Scan. If they internally handle column- > oriented data format, TupleTableSlot is not a best way for data exchange. I agree TupleTableSlot may not be best way for bulk data movement. We probably need to look at buffering/bulk movement between executor nodes in general, which would be of benefit for the FDW case also. This would be a problem even for Custom Scans as originally presented also, so I don't see much change there. > * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented > using writable FDW feature. Not a big issue, but don't forget it... Yes, possible. I hope these ideas make sense. This is early days and there may be other ideas and much detail yet to come. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > Agreed. My proposal is that if the planner allows the lookaside to an > FDW then we pass the query for full execution on the FDW. That means > that the scan, aggregate and join could take place via the FDW. i.e. > "Custom Plan" == lookaside + FDW How about we get that working for FDWs to begin with and then we can come back to this idea..? We're pretty far from join-pushdown or aggregate-pushdown to FDWs, last I checked, and having those would be a massive win for everyone using FDWs. Thanks, Stephen
* Simon Riggs (simon@2ndQuadrant.com) wrote: > IMHO we would not want to add indexes to every column, on every table, > nor would we wish to use lookaside for all tables. It is a good thing > to be able to add optimizations for individual tables. GPUs are not > good for everything; it is good to be able to leverage their > strengths, yet avoid their weaknesses. It's the optimizer's job to figure out which path to pick though, based on which will have the lowest cost. > If do you want that, you can write an Event Trigger that automatically > adds a lookaside for any table. This sounds terribly ugly and like we're pushing optimization decisions on to the user instead of just figuring out what the best answer is. > I agree TupleTableSlot may not be best way for bulk data movement. We > probably need to look at buffering/bulk movement between executor > nodes in general, which would be of benefit for the FDW case also. > This would be a problem even for Custom Scans as originally presented > also, so I don't see much change there. Being able to do bulk movement would be useful, but (as I proposed months ago) being able to do asyncronous returns would be extremely useful also, when you consider FDWs and Append()- the main point there being that you want to keep the FDWs busy and working in parallel. Thanks, Stephen
On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: >> IMHO we would not want to add indexes to every column, on every table, >> nor would we wish to use lookaside for all tables. It is a good thing >> to be able to add optimizations for individual tables. GPUs are not >> good for everything; it is good to be able to leverage their >> strengths, yet avoid their weaknesses. > > It's the optimizer's job to figure out which path to pick though, based > on which will have the lowest cost. Of course. I'm not suggesting otherwise. >> If do you want that, you can write an Event Trigger that automatically >> adds a lookaside for any table. > > This sounds terribly ugly and like we're pushing optimization decisions > on to the user instead of just figuring out what the best answer is. I'm proposing that we use a declarative approach, just like we do when we say CREATE INDEX. The idea is that we only consider a lookaside when a lookaside has been declared. Same as when we add an index, the optimizer considers whether to use that index. What we don't want to happen is that the optimizer considers a GIN plan, even when a GIN index is not available. I'll explain it more at the developer meeting. It probably sounds a bit weird at first. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote: > > It's the optimizer's job to figure out which path to pick though, based > > on which will have the lowest cost. > > Of course. I'm not suggesting otherwise. > > >> If do you want that, you can write an Event Trigger that automatically > >> adds a lookaside for any table. > > > > This sounds terribly ugly and like we're pushing optimization decisions > > on to the user instead of just figuring out what the best answer is. > > I'm proposing that we use a declarative approach, just like we do when > we say CREATE INDEX. There's quite a few trade-offs when it comes to indexes though. I'm trying to figure out when you wouldn't want to use a GPU, if it's available to you and the cost model says it's faster? To me, that's kind of like saying you want a declarative approach for when to use a HashJoin. > The idea is that we only consider a lookaside when a lookaside has > been declared. Same as when we add an index, the optimizer considers > whether to use that index. What we don't want to happen is that the > optimizer considers a GIN plan, even when a GIN index is not > available. Yes, I understood your proposal- I just don't agree with it. ;) For MatViews and/or Indexes, there are trade-offs to be had as it relates to disk space, insert speed, etc. Thanks, Stephen
> > Let me list up the things to be clarified / developed randomly. > > > > * Join replacement by FDW; We still don't have consensus about join > replacement > > by FDW. Probably, it will be designed to remote-join implementation > primarily, > > however, things to do is similar. We may need to revisit the Hanada-san's > > proposition in the past. > > Agreed. We need to push down joins into FDWs and we need to push down > aggregates also, so they can be passed to FDWs. I'm planning to look at > aggregate push down. > Probably, it's a helpful feature. > > * Lookaside for ANY relations; I want planner to try GPU-scan for any > relations > > once installed, to reduce user's administration cost. > > It needs lookaside allow to specify a particular foreign-server, not > foreign- > > table, then create ForeignScan node that is not associated with a > particular > > foreign-table. > > IMHO we would not want to add indexes to every column, on every table, nor > would we wish to use lookaside for all tables. It is a good thing to be > able to add optimizations for individual tables. GPUs are not good for > everything; it is good to be able to leverage their strengths, yet avoid > their weaknesses. > > If do you want that, you can write an Event Trigger that automatically adds > a lookaside for any table. > It may be a solution if we try to replace scan on a relation by a ForeignScan, in other words, a case when we can describe 1:1 relationship between a table and a foreign-table; being alternatively scanned. Is it possible to fit a case when a ForeignScan replaces a built-in Join plans? I don't think it is a realistic assumption to set up lookaside configuration for all the possible combination of joins, preliminary. I have an idea; if lookaside accept a function, foreign-server or something subjective entity as an alternative path, it will be able to create paths on the fly, not only preconfigured foreign-tables. This idea will take two forms of DDL commands as: CREATE LOOKASIDE <name> ON <target reltaion> TO <alternative table/matview/foreign table/...>; CREATE LOOKASIDE <name> ON <target relation> EXECUTE <path generator function>; Things to do internally is same. TO- form kicks a built-in routine, instead of user defined function, to add alternative scan/join paths according to the supplied table/matview/foreign table and so on. > > * ForeignScan node that is not associated with a particular foreign-table. > > Once we try to apply ForeignScan node instead of Sort or Aggregate, > existing > > FDW implementation needs to be improved. These nodes scan on a > materialized > > relation (generated on the fly), however, existing FDW code assumes > > ForeignScan node is always associated with a particular foreign-table. > > We need to eliminate this restriction. > > I don't think we need to do that, given the above. > It makes a problem if ForeignScan is chosen as alternative path of Join. The target-list of Join node are determined according to the query form on the fly, so we cannot expect a particular TupleDesc to be returned preliminary. Once we try to apply ForeignScan instead of Join node, it has to have its TupleDesc depending on a set of joined relations. I think, it is more straightforward approach to allow ForeignScan that is not associated to a particular (cataloged) relations. > > * FDW method for MultiExec. In case when we can stack multiple ForeignScan > > nodes, it's helpful to support to exchange scanned tuples in their own > > data format. Let's assume two ForeignScan nodes are stacked. One > performs > > like Sort, another performs like Scan. If they internally handle column- > > oriented data format, TupleTableSlot is not a best way for data > exchange. > > I agree TupleTableSlot may not be best way for bulk data movement. We > probably need to look at buffering/bulk movement between executor nodes > in general, which would be of benefit for the FDW case also. > This would be a problem even for Custom Scans as originally presented also, > so I don't see much change there. > Yes. I is the reason why my Custom Scan proposition supports MultiExec method. > > * Lookaside on the INSERT/UPDATE/DELETE. Probably, it can be implemented > > using writable FDW feature. Not a big issue, but don't forget it... > > Yes, possible. > > > I hope these ideas make sense. This is early days and there may be other > ideas and much detail yet to come. > I'd like to agree general direction. My biggest concern towards FDW is transparency for application. If lookaside allows to redirect a reference towards scan/join on regular relations by ForeignScan (as an alternative method to execute), here is no strong reason to stick on custom-plan. However, existing ForeignScan node does not support to work without a particular foreign table. It may become a restriction if we try to replace Join node by ForeignScan, and it is my worry. (Even it may be solved during Join replacement by FDW works.) One other point I noticed. * SubPlan support; if an extension support its special logic to join relations, but don't want to support various methodto scan relations, it is natural to leverage built-in scan logics (like SeqScan, ...). I want ForeignScan to supportto have SubPlans if FDW driver has capability. I believe it can be implemented according to the existing manner, butwe need to expose several static functions to handle plan-tree recursively. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 8 May 2014 01:49, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> > * ForeignScan node that is not associated with a particular foreign-table. >> > Once we try to apply ForeignScan node instead of Sort or Aggregate, >> existing >> > FDW implementation needs to be improved. These nodes scan on a >> materialized >> > relation (generated on the fly), however, existing FDW code assumes >> > ForeignScan node is always associated with a particular foreign-table. >> > We need to eliminate this restriction. >> >> I don't think we need to do that, given the above. >> > It makes a problem if ForeignScan is chosen as alternative path of Join. > > The target-list of Join node are determined according to the query form > on the fly, so we cannot expect a particular TupleDesc to be returned > preliminary. Once we try to apply ForeignScan instead of Join node, it > has to have its TupleDesc depending on a set of joined relations. > > I think, it is more straightforward approach to allow ForeignScan that > is not associated to a particular (cataloged) relations. From your description, my understanding is that you would like to stream data from 2 standard tables to the GPU, then perform a join on the GPU itself. I have been told that is not likely to be useful because of the data transfer overheads. Or did I misunderstand, and that this is intended to get around the current lack of join pushdown into FDWs? Can you be specific about the actual architecture you wish for, so we can understand how to generalise that into an API? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 May 2014 18:39, Stephen Frost <sfrost@snowman.net> wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: >> On 7 May 2014 17:43, Stephen Frost <sfrost@snowman.net> wrote: >> > It's the optimizer's job to figure out which path to pick though, based >> > on which will have the lowest cost. >> >> Of course. I'm not suggesting otherwise. >> >> >> If do you want that, you can write an Event Trigger that automatically >> >> adds a lookaside for any table. >> > >> > This sounds terribly ugly and like we're pushing optimization decisions >> > on to the user instead of just figuring out what the best answer is. >> >> I'm proposing that we use a declarative approach, just like we do when >> we say CREATE INDEX. > > There's quite a few trade-offs when it comes to indexes though. I'm > trying to figure out when you wouldn't want to use a GPU, if it's > available to you and the cost model says it's faster? To me, that's > kind of like saying you want a declarative approach for when to use a > HashJoin. I'm proposing something that is like an index, not like a plan node. The reason that proposal is being made is that we need to consider data structure, data location and processing details. * In the case of Mat Views, if there is no Mat View, then we can't use it - we can't replace that with just any mat view instead * GPUs and other special processing units have finite data transfer rates, so other people have proposed that they retain data on the GPU/SPU - so we want to do a lookaside only for situations where the data is already prepared to handle a lookaside. * The other cases I cited of in-memory data structures are all pre-arranged items with structures suited to processing particular types of query Given that I count 4-5 beneficial use cases for this index-like lookaside, it seems worth investing time in. It appears that Kaigai wishes something else in addition to this concept, so there may be some confusion from that. I'm sure it will take a while to really understand all the ideas and possibilities. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon, * Simon Riggs (simon@2ndQuadrant.com) wrote: > On 8 May 2014 01:49, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >From your description, my understanding is that you would like to > stream data from 2 standard tables to the GPU, then perform a join on > the GPU itself. > > I have been told that is not likely to be useful because of the data > transfer overheads. That was my original understanding and, I believe, the case at one point, however... > Or did I misunderstand, and that this is intended to get around the > current lack of join pushdown into FDWs? I believe the issue with the transfer speeds to the GPU have been either eliminated or at least reduced to the point where it's practical now. This is all based on prior discussions with KaiGai- I've not done any testing myself. In any case, this is exactly what they're looking to do, as I understand it, and to do the same with aggregates that work well on GPUs. > Can you be specific about the actual architecture you wish for, so we > can understand how to generalise that into an API? It's something that *could* be done with FDWs, once they have the ability to have join push-down and aggregate push-down, but I (and, as I understand it, Tom) feel isn't really the right answer for this because the actual *data* is completely under PG in this scenario. It's just in-memory processing that's being done on the GPU and in the GPU's memory. KaiGai has speculated about other possibilities (eg: having the GPU's memory also used as some kind of multi-query cache, which would reduce the transfer costs, but at a level of complexity regarding that cache that I'm not sure it'd be sensible to try and do and, in any case, could be done later and might make sense independently, if we could make it work for, say, a memcached environment too; I'm thinking it would be transaction-specific, but even that would be pretty tricky unless we held locks across every row...). Thanks, Stephen
Simon, * Simon Riggs (simon@2ndQuadrant.com) wrote: > I'm proposing something that is like an index, not like a plan node. > > The reason that proposal is being made is that we need to consider > data structure, data location and processing details. > > * In the case of Mat Views, if there is no Mat View, then we can't use > it - we can't replace that with just any mat view instead I agree with you about MatView's. There are clear trade-offs there, similar to those with indexes. > * GPUs and other special processing units have finite data transfer > rates, so other people have proposed that they retain data on the > GPU/SPU - so we want to do a lookaside only for situations where the > data is already prepared to handle a lookaside. I've heard this and I'm utterly unconvinced that it could be made to work at all- and it's certainly moving the bar of usefullness quite far away, making the whole thing much less practical. If we can't cost for this transfer rate and make use of GPUs for medium-to-large size queries which are only transient, then perhaps shoving all GPU work out across an FDW is actually the right solution, and make that like some kind of MatView as you're proposing- but I don't see how you're going to manage updates and invalidation of that data in a sane way for a multi-user PG system. > * The other cases I cited of in-memory data structures are all > pre-arranged items with structures suited to processing particular > types of query If it's transient in-memory work, I'd like to see our generalized optimizer consider them all instead of pushing that job on the user to decide when the optimizer should consider certain methods. > Given that I count 4-5 beneficial use cases for this index-like > lookaside, it seems worth investing time in. I'm all for making use of MatViews and GPUs, but there's more than one way to get there and look-asides feels like pushing the decision, unnecessarily, on to the user. Thanks, Stephen
2014-05-07 18:06 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Let me list up the things to be clarified / developed randomly. > > * Join replacement by FDW; We still don't have consensus about join replacement > by FDW. Probably, it will be designed to remote-join implementation primarily, > however, things to do is similar. We may need to revisit the Hanada-san's > proposition in the past. I can't recall the details soon but the reason I gave up was about introducing ForiegnJoinPath node, IIRC. I'll revisit the discussion and my proposal. -- Shigeru HANADA
> >> > * ForeignScan node that is not associated with a particular > foreign-table. > >> > Once we try to apply ForeignScan node instead of Sort or > >> > Aggregate, > >> existing > >> > FDW implementation needs to be improved. These nodes scan on a > >> materialized > >> > relation (generated on the fly), however, existing FDW code assumes > >> > ForeignScan node is always associated with a particular > foreign-table. > >> > We need to eliminate this restriction. > >> > >> I don't think we need to do that, given the above. > >> > > It makes a problem if ForeignScan is chosen as alternative path of Join. > > > > The target-list of Join node are determined according to the query > > form on the fly, so we cannot expect a particular TupleDesc to be > > returned preliminary. Once we try to apply ForeignScan instead of Join > > node, it has to have its TupleDesc depending on a set of joined relations. > > > > I think, it is more straightforward approach to allow ForeignScan that > > is not associated to a particular (cataloged) relations. > > From your description, my understanding is that you would like to stream > data from 2 standard tables to the GPU, then perform a join on the GPU itself. > > I have been told that is not likely to be useful because of the data transfer > overheads. > Here are two solutions. One is currently I'm working; in case when number of rows in left- and right- tables are not balanced well, we can keep a hash table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from the other side. Kernel execution and data transfer can be run asynchronously, so it allows to hide data transfer cost as long as we have enough number of chunks, like processor pipelining. Other solution is "integrated" GPU that kills necessity of data transfer, like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are moving to same direction. > Or did I misunderstand, and that this is intended to get around the current > lack of join pushdown into FDWs? > The logic above is obviously executed on the extension side, so it needs ForeignScan node to perform like Join node; that reads two input relation streams and output one joined relation stream. It is quite similar to expected FDW join-pushdown design. It will consume (remote) two relations and generates one output stream; looks like a scan on a particular relation (but no catalog definition here). Probably, it shall be visible to local backend as follows: (it is a result of previous prototype based on custom-plan api) postgres=# EXPLAIN VERBOSE SELECT count(*) FROM pgbench1_branches b JOIN pgbench1_accounts a ON a.bid = b.bid WHERE aid< 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=101.60..101.61 rows=1 width=0) Output: count(*) -> Custom Scan (postgres-fdw) (cost=100.00..101.43 rows=71 width=0) Remote SQL: SELECT NULL FROM (public.pgbench_branches r1 JOIN public.pgbench_accounts r2 ON ((r1.bid = r2.bid)))WHERE ((r2.aid < 100)) (4 rows) The place of "Custom Scan" node will be ForeignScan, if Join pushdown got supported. At that time, what relation should be scanned by this ForeignScan? It is the reason why I proposed ForeignScan node without particular relation. > Can you be specific about the actual architecture you wish for, so we can > understand how to generalise that into an API? > If we push the role of CustomPlan node into ForeignScan, I want to use this node to acquire control during query planning/execution. As I did in the custom-plan patch, first of all, I want extension to have a chance to add alternative path towards particular scan/join. If extension can take over the execution, it will generate a ForeignPath (or CustomPath) node then call add_path(). As usual manner, planner decide whether the alternative path is cheaper than other candidates. In case when it replaced scan relation by ForeignScan, it is almost same as existing API doing, except for the underlying relation is regular one, not foreign table. In case when it replaced join relations by ForeignScan, it will be almost same as expected ForeignScan with join-pushed down. Unlike usual table scan, it does not have actual relation definition on catalog, and its result tuple-slot is determined on the fly. One thing different from the remote-join is, this ForeignScan node may have sub-plans locally, if FDW driver (e.g GPU execution) may have capability on Join only, but no relation scan portion. So, unlike its naming, I want ForeignScan to support to have sub-plans if FDW driver supports the capability. Does it make you clear? Or, makes you more confused?? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 8 May 2014 04:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> From your description, my understanding is that you would like to stream >> data from 2 standard tables to the GPU, then perform a join on the GPU itself. >> >> I have been told that is not likely to be useful because of the data transfer >> overheads. >> > Here are two solutions. One is currently I'm working; in case when number > of rows in left- and right- tables are not balanced well, we can keep a hash > table in the GPU DRAM, then we transfer the data stream chunk-by-chunk from > the other side. Kernel execution and data transfer can be run asynchronously, > so it allows to hide data transfer cost as long as we have enough number of > chunks, like processor pipelining. Makes sense to me, thanks for explaining. The hardware-enhanced hash join sounds like a great idea. My understanding is we would need * a custom cost-model * a custom execution node The main question seems to be whether doing that would be allowable, cos its certainly doable. I'm still looking for a way to avoid adding planning time for all queries though. > Other solution is "integrated" GPU that kills necessity of data transfer, > like Intel's Haswell, AMD's Kaveri or Nvidia's Tegra K1; all majors are > moving to same direction. Sounds useful, but very non-specific, as yet. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 May 2014 03:36, Stephen Frost <sfrost@snowman.net> wrote: >> Given that I count 4-5 beneficial use cases for this index-like >> lookaside, it seems worth investing time in. > > I'm all for making use of MatViews and GPUs, but there's more than one > way to get there and look-asides feels like pushing the decision, > unnecessarily, on to the user. I'm not sure I understand where most of your comments come from, so its clear we're not talking about the same things yet. We have multiple use cases where an alternate data structure could be used to speed up queries. My goal is to use the alternate data structure(s) 1) if the data structure contains matching data for the current query 2) only when the user has explicitly stated it would be correct to do so, and they wish it 3) transparently to the application, rather than forcing them to recode 4) after fully considering cost-based optimization, which we can only do if it is transparent all of which is how mat views work in other DBMS. My additional requirement is 5) allow this to work with data structures outside the normal heap/index/block structures, since we have multiple already working examples of such things and many users wish to leverage those in their applications which I now understand is different from the main thrust of Kaigai's proposal, so I will restate this later on another thread. The requirement is similar to the idea of running CREATE MATERIALIZED VIEW foo BUILD DEFERRED REFRESH COMPLETE ON DEMAND ENABLE QUERY REWRITE ON PREBUILT TABLE but expands on that to encompass any external data structure. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 7, 2014 at 4:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Agreed. My proposal is that if the planner allows the lookaside to an > FDW then we pass the query for full execution on the FDW. That means > that the scan, aggregate and join could take place via the FDW. i.e. > "Custom Plan" == lookaside + FDW > > Or put another way, if we add Lookaside then we can just plug in the > pgstrom FDW directly and we're done. And everybody else's FDW will > work as well, so Citus etcc will not need to recode. As Stephen notes downthread, Tom has already expressed opposition to this idea on other threads, and I tend to agree with him, at least to some degree. I think the drive to use foreign data wrappers for PGStrom, CitusDB, and other things that aren't really foreign data wrappers as originally conceived is a result of the fact that we've got only one interface in this area that looks remotely like something pluggable; and so everyone's trying to fit things into the constraints of that interface whether it's actually a good fit or not. Unfortunately, I think what CitusDB really wants is pluggable storage, and what PGStrom really wants is custom paths, and I don't think either of those things is the same as what FDWs provide. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Simon Riggs (simon@2ndQuadrant.com) wrote: > On 8 May 2014 03:36, Stephen Frost <sfrost@snowman.net> wrote: > > I'm all for making use of MatViews and GPUs, but there's more than one > > way to get there and look-asides feels like pushing the decision, > > unnecessarily, on to the user. > > I'm not sure I understand where most of your comments come from, so > its clear we're not talking about the same things yet. > > We have multiple use cases where an alternate data structure could be > used to speed up queries. I don't view on-GPU memory as being an alternate *permanent* data store. Perhaps that's the disconnect that we have here, as it was my understanding that we're talking about using GPUs to make queries run faster where the data comes from regular tables. > My goal is to use the alternate data structure(s) Pluggable storage is certainly interesting, but I view that as independent of the CustomPlan-related work. > which I now understand is different from the main thrust of Kaigai's > proposal, so I will restate this later on another thread. Sounds good. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > As Stephen notes downthread, Tom has already expressed opposition to > this idea on other threads, and I tend to agree with him, at least to > some degree. I think the drive to use foreign data wrappers for > PGStrom, CitusDB, and other things that aren't really foreign data > wrappers as originally conceived is a result of the fact that we've > got only one interface in this area that looks remotely like something > pluggable; and so everyone's trying to fit things into the constraints > of that interface whether it's actually a good fit or not. Agreed. > Unfortunately, I think what CitusDB really wants is pluggable storage, > and what PGStrom really wants is custom paths, and I don't think > either of those things is the same as what FDWs provide. I'm not entirely sure that PGStrom even really "wants" custom paths.. I believe the goal there is to be able to use GPUs to do work for us and custom paths/pluggable plan/execution are seen as the way to do that and not depend on libraries which are under GPL, LGPL or other licenses which we'd object to depending on from core. Personally, I'd love to just see CUDA or whatever support in core as a configure option and be able to detect at start-up when the right libraries and hardware are available and enable the join types which could make use of that gear. I don't like that we're doing all of this because of licenses or whatever and would still hope to figure out a way to address those issues but I haven't had time to go research it myself and evidently KaiGai and others see the issues there as insurmountable, so they're trying to work around it by creating a pluggable interface where an extension could provide those join types. Thanks, Stephen
On 8 May 2014 13:48, Stephen Frost <sfrost@snowman.net> wrote: >> We have multiple use cases where an alternate data structure could be >> used to speed up queries. > > I don't view on-GPU memory as being an alternate *permanent* data store. As I've said, others have expressed an interest in placing specific data on specific external resources that we would like to use to speed up queries. That might be termed a "cache" of various kinds or it might be simply be an allocation of that resource to a specific purpose. If we forget GPUs, that leaves multiple use cases that do fit the description. > Perhaps that's the disconnect that we have here, as it was my > understanding that we're talking about using GPUs to make queries run > faster where the data comes from regular tables. I'm trying to consider a group of use cases, so we get a generic API that is useful to many people, not just to one use case. I had understood the argument to be there must be multiple potential users of an API before we allow it. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 May 2014 04:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > In case when it replaced join relations by ForeignScan, it will be almost > same as expected ForeignScan with join-pushed down. Unlike usual table scan, > it does not have actual relation definition on catalog, and its result > tuple-slot is determined on the fly. > One thing different from the remote-join is, this ForeignScan node may have > sub-plans locally, if FDW driver (e.g GPU execution) may have capability on > Join only, but no relation scan portion. > So, unlike its naming, I want ForeignScan to support to have sub-plans if > FDW driver supports the capability. From here, it looks exactly like pushing a join into an FDW. If we had that, we wouldn't need Custom Scan at all. I may be mistaken and there is a critical difference. Local sub-plans doesn't sound like a big difference. Have we considered having an Optimizer and Executor plugin that does this without touching core at all? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > On 8 May 2014 13:48, Stephen Frost <sfrost@snowman.net> wrote: > > I don't view on-GPU memory as being an alternate *permanent* data store. > > As I've said, others have expressed an interest in placing specific > data on specific external resources that we would like to use to speed > up queries. That might be termed a "cache" of various kinds or it > might be simply be an allocation of that resource to a specific > purpose. I don't think some generalized structure that addresses the goals of FDWs, CustomPaths, MatViews and query cacheing is going to be workable and I'm definitely against having to specify at a per-relation level when I want certain join types to be considered. > > Perhaps that's the disconnect that we have here, as it was my > > understanding that we're talking about using GPUs to make queries run > > faster where the data comes from regular tables. > > I'm trying to consider a group of use cases, so we get a generic API > that is useful to many people, not just to one use case. I had > understood the argument to be there must be multiple potential users > of an API before we allow it. The API you've outlined requires users to specify on a per-relation basis what join types are valid. As for if CustomPlans, there's certainly potential for many use-cases there beyond just GPUs. What I'm unsure about is if any others would actually need to be implemented externally as the GPU-related work seems to need or if we would just implement those other join types in core. Thanks, Stephen
> On Wed, May 7, 2014 at 4:01 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Agreed. My proposal is that if the planner allows the lookaside to an > > FDW then we pass the query for full execution on the FDW. That means > > that the scan, aggregate and join could take place via the FDW. i.e. > > "Custom Plan" == lookaside + FDW > > > > Or put another way, if we add Lookaside then we can just plug in the > > pgstrom FDW directly and we're done. And everybody else's FDW will > > work as well, so Citus etcc will not need to recode. > > As Stephen notes downthread, Tom has already expressed opposition to this > idea on other threads, and I tend to agree with him, at least to some degree. > I think the drive to use foreign data wrappers for PGStrom, CitusDB, and > other things that aren't really foreign data wrappers as originally > conceived is a result of the fact that we've got only one interface in this > area that looks remotely like something pluggable; and so everyone's trying > to fit things into the constraints of that interface whether it's actually > a good fit or not. > Unfortunately, I think what CitusDB really wants is pluggable storage, and > what PGStrom really wants is custom paths, and I don't think either of those > things is the same as what FDWs provide. > Yes, what PGStrom really needs is a custom paths; that allows extension to replace a part of built-in nodes according to extension's characteristics. The discussion upthread clarified that FDW needs to be enhanced to support functionality that PGStrom wants to provide, however, some of them also needs redefinition of FDW, indeed. Umm... I'm now missing the direction towards my goal. What approach is the best way to glue PostgreSQL and PGStrom? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
* Simon Riggs (simon@2ndQuadrant.com) wrote: > >From here, it looks exactly like pushing a join into an FDW. If we had > that, we wouldn't need Custom Scan at all. > > I may be mistaken and there is a critical difference. Local sub-plans > doesn't sound like a big difference. Erm. I'm not sure that you're really thinking through what you're suggesting. Allow me to re-state your suggestion here: An FDW is loaded which provides hook for join push-down (whatever those end up being). A query is run which joins *local* table A to *local* table B. Standard heaps, standard indexes, all local to this PG instance. The FDW which supports join push-down is then passed this join for planning, with local sub-plans for the local tables. > Have we considered having an Optimizer and Executor plugin that does > this without touching core at all? Uh, isn't that what we're talking about? The issue is that there's a bunch of internal functions that such a plugin would need to either have access to or re-implement, but we'd rather not expose those internal functions to the whole world because they're, uh, internal helper routines, essentially, which could disappear in another release. The point is that there isn't a good API for this today and what's being proposed isn't a good API, it's just bolted-on to the existing system by exposing what are rightfully internal routines. Thanks, Stephen
On 8 May 2014 14:32, Stephen Frost <sfrost@snowman.net> wrote: > The API you've outlined requires users to specify on a per-relation > basis what join types are valid. No, it doesn't. I've not said or implied that at any point. If you keep telling me what I mean, rather than asking, we won't get anywhere. I think that's as far as we'll get on email. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon, Perhaps you've changed your proposal wrt LOOKASIDES's and I've missed it somewhere in the thread, but this is what I was referring to with my concerns regarding per-relation definition of 'LOOKASIDES': * Simon Riggs (simon@2ndQuadrant.com) wrote: > Roughly, I'm thinking of this... > > CREATE LOOKASIDE ON foo > TO foo_mat_view; > > and also this... > > CREATE LOOKASIDE ON foo > TO foo_as_a_foreign_table /* e.g. PGStrom */ where I took 'foo' to mean 'a relation'. Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same vein, though there I agree that we need it per-relation as there are other trade-offs to consider (storage costs of the matview, cost to maintain the matview, etc, similar to indexes). The PGStrom proposal, aiui, is to add a new join type which supports using a GPU to answer a query where all the data is in regular PG tables. I'd like that to "just work" when a GPU is available (perhaps modulo having to install some extension), for any join which is costed to be cheaper/faster when done that way. Thanks, Stephen
On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote: > Allow me to re-state your suggestion here: > > An FDW is loaded which provides hook for join push-down (whatever those > end up being). > > A query is run which joins *local* table A to *local* table B. Standard > heaps, standard indexes, all local to this PG instance. > > The FDW which supports join push-down is then passed this join for > planning, with local sub-plans for the local tables. Yes that is correct; thank you for confirming your understanding with me. That also supports custom join of local to non-local table, or custom join of two non-local tables. If we can use interfaces that already exist with efficiency, why invent a new one? >> Have we considered having an Optimizer and Executor plugin that does >> this without touching core at all? > > Uh, isn't that what we're talking about? No. I meant writing this as an extension rather than a patch on core. > The issue is that there's a > bunch of internal functions that such a plugin would need to either have > access to or re-implement, but we'd rather not expose those internal > functions to the whole world because they're, uh, internal helper > routines, essentially, which could disappear in another release. > > The point is that there isn't a good API for this today and what's being > proposed isn't a good API, it's just bolted-on to the existing system by > exposing what are rightfully internal routines. I think the main point is that people don't want to ask for our permission before they do what they want to do. We either help people use Postgres, or they go elsewhere. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 May 2014 14:49, Stephen Frost <sfrost@snowman.net> wrote: > Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same > vein, though there I agree that we need it per-relation as there are > other trade-offs to consider (storage costs of the matview, cost to > maintain the matview, etc, similar to indexes). > > The PGStrom proposal, aiui, is to add a new join type which supports > using a GPU to answer a query where all the data is in regular PG > tables. I'd like that to "just work" when a GPU is available (perhaps > modulo having to install some extension), for any join which is costed > to be cheaper/faster when done that way. All correct and agreed. As I explained earlier, lets cover the join requirement here and we can discuss lookasides to data structures at Pgcon. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote: > > Allow me to re-state your suggestion here: > > > > An FDW is loaded which provides hook for join push-down (whatever those > > end up being). > > > > A query is run which joins *local* table A to *local* table B. Standard > > heaps, standard indexes, all local to this PG instance. > > > > The FDW which supports join push-down is then passed this join for > > planning, with local sub-plans for the local tables. > > Yes that is correct; thank you for confirming your understanding with me. I guess for my part, that doesn't look like an FDW any more. > That also supports custom join of local to non-local table, or custom > join of two non-local tables. Well, we already support these, technically, but the FDW doesn't actually implement the join, it's done in core. > If we can use interfaces that already exist with efficiency, why > invent a new one? Perhaps once we have a proposal for FDW join push-down this will make sense, but I'm not seeing it right now. > >> Have we considered having an Optimizer and Executor plugin that does > >> this without touching core at all? > > > > Uh, isn't that what we're talking about? > > No. I meant writing this as an extension rather than a patch on core. KaiGai's patches have been some changes to core and then an extension which uses those changes. The changes to core include exposing internal functions for extensions to use, which will undoubtably end up being a sore spot and fragile. Thanks, Stephen
On 8 May 2014 15:25, Stephen Frost <sfrost@snowman.net> wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: >> On 8 May 2014 14:40, Stephen Frost <sfrost@snowman.net> wrote: >> > Allow me to re-state your suggestion here: >> > >> > An FDW is loaded which provides hook for join push-down (whatever those >> > end up being). >> > >> > A query is run which joins *local* table A to *local* table B. Standard >> > heaps, standard indexes, all local to this PG instance. >> > >> > The FDW which supports join push-down is then passed this join for >> > planning, with local sub-plans for the local tables. >> >> Yes that is correct; thank you for confirming your understanding with me. > > I guess for my part, that doesn't look like an FDW any more. If it works, it works. If it doesn't, we can act otherwise. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 7 May 2014 02:05, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > (1) DDL support and system catalog > > Simon suggested that DDL command should be supported to track custom- > plan providers being installed, and to avoid nonsense hook calls > if it is an obvious case that custom-plan provider can help. It also > makes sense to give a chance to load extensions once installed. > (In the previous design, I assumed modules are loaded by LOAD command > or *_preload_libraries parameters). I've tried hard to bend my mind to this and its beginning to sink in. We've already got pg_am for indexes, and soon to have pg_seqam for sequences. It would seem normal and natural to have * pg_joinam catalog table for "join methods" with a join method API Which would include some way of defining which operators/datatypes we consider this for, so if PostGIS people come up with some fancy GIS join thing, we don't invoke it every time even when its inapplicable. I would prefer it if PostgreSQL also had some way to control when the joinam was called, possibly with some kind of table_size_threshold on the AM tuple, which could be set to >=0 to control when this was even considered. * pg_scanam catalog table for "scan methods" with a scan method API Again, a list of operators that can be used with it, like indexes and operator classes By analogy to existing mechanisms, we would want * A USERSET mechanism to allow users to turn it off for testing or otherwise, at user, database level We would also want * A startup call that allows us to confirm it is available and working correctly, possibly with some self-test for hardware, performance confirmation/derivation of planning parameters * Some kind of trace mode that would allow people to confirm the outcome of calls * Some interface to the stats system so we could track the frequency of usage of each join/scan type. This would be done within Postgres, tracking the calls by name, rather than trusting the plugin to do it for us > I tried to implement the following syntax: > > CREATE CUSTOM PLAN <name> FOR (scan|join|any) HANDLER <func_name>; Not sure if we need that yet -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > It would seem normal and natural to have > > * pg_joinam catalog table for "join methods" with a join method API > Which would include some way of defining which operators/datatypes we > consider this for, so if PostGIS people come up with some fancy GIS > join thing, we don't invoke it every time even when its inapplicable. > I would prefer it if PostgreSQL also had some way to control when the > joinam was called, possibly with some kind of table_size_threshold on > the AM tuple, which could be set to >=0 to control when this was even > considered. It seems useful to think about how we would redefine our existing join methods using such a structure. While thinking about that, it seems like we would worry more about what the operators provide rather than the specific operators themselves (ala hashing / HashJoin) and I'm not sure we really care about the data types directly- just about the operations which we can do on them.. I can see a case for sticking data types into this if we feel that we have to constrain the path possibilities for some reason, but I'd rather try and deal with any issues around "it doesn't make sense to do X because we'll know it'll be really expensive" through the cost model instead of with a table that defines what's allowed or not allowed. There may be cases where we get the costing wrong and it's valuable to be able to tweak cost values on a per-connection basis or for individual queries. I don't mean to imply that a 'pg_joinam' table is a bad idea, just that I'd think of it being defined in terms of what capabilities it requires of operators and a way for costing to be calculated for it, plus the actual functions which it provides to implement the join itself (to include some way to get output suitable for explain, etc..). > * pg_scanam catalog table for "scan methods" with a scan method API > Again, a list of operators that can be used with it, like indexes and > operator classes Ditto for this- but there's lots of other things this makes me wonder about because it's essentially trying to define a pluggable storage layer, which is great, but also requires some way to deal with all of things we use our storage system for: cacheing / shared buffers, locking, visibility, WAL, unique identifier / ctid (for use in indexes, etc)... > By analogy to existing mechanisms, we would want > > * A USERSET mechanism to allow users to turn it off for testing or > otherwise, at user, database level If we re-implement our existing components through this ("eat our own dogfood" as it were), I'm not sure that we'd be able to have a way to turn it on/off.. I realize we wouldn't have to, but then it seems like we'd have two very different code paths and likely a different level of support / capability afforded to "external" storage systems and then I wonder if we're not back to just FDWs again.. > We would also want > > * A startup call that allows us to confirm it is available and working > correctly, possibly with some self-test for hardware, performance > confirmation/derivation of planning parameters Yeah, we'd need this for anything that supports a GPU, regardless of how we implement it, I'd think. > * Some kind of trace mode that would allow people to confirm the > outcome of calls Seems like this would be useful independently of the rest.. > * Some interface to the stats system so we could track the frequency > of usage of each join/scan type. This would be done within Postgres, > tracking the calls by name, rather than trusting the plugin to do it > for us This is definitely something I want for core already... Thanks, Stephen
On Thu, May 8, 2014 at 3:10 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: >> It would seem normal and natural to have >> >> * pg_joinam catalog table for "join methods" with a join method API >> Which would include some way of defining which operators/datatypes we >> consider this for, so if PostGIS people come up with some fancy GIS >> join thing, we don't invoke it every time even when its inapplicable. >> I would prefer it if PostgreSQL also had some way to control when the >> joinam was called, possibly with some kind of table_size_threshold on >> the AM tuple, which could be set to >=0 to control when this was even >> considered. > > It seems useful to think about how we would redefine our existing join > methods using such a structure. While thinking about that, it seems > like we would worry more about what the operators provide rather than > the specific operators themselves (ala hashing / HashJoin) and I'm not > sure we really care about the data types directly- just about the > operations which we can do on them.. I'm pretty skeptical about this whole line of inquiry. We've only got three kinds of joins, and each one of them has quite a bit of bespoke logic, and all of this code is pretty performance-sensitive on large join nests. If there's a way to make this work for KaiGai's use case at all, I suspect something really lightweight like a hook, which should have negligible impact on other workloads, is a better fit than something involving system catalog access. But I might be wrong. I also think that there are really two separate problems here: getting the executor to call a custom scan node when it shows up in the plan tree; and figuring out how to get it into the plan tree in the first place. I'm not sure we've properly separated those problems, and I'm not sure into which category the issues that sunk KaiGai's 9.4 patch fell. Most of this discussion seems like it's about the latter problem, but we need to solve both. For my money, we'd be better off getting some kind of basic custom scan node functionality committed first, even if the cases where you can actually inject them into real plans are highly restricted. Then, we could later work on adding more ways to inject them in more places. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote: > For my money, we'd be better off > getting some kind of basic custom scan node functionality committed > first, even if the cases where you can actually inject them into real > plans are highly restricted. Then, we could later work on adding more > ways to inject them in more places. We're past the prototyping stage and into productionising what we know works, AFAIK. If that point is not clear, then we need to discuss that first. At the moment the Custom join hook is called every time we attempt to cost a join, with no restriction. I would like to highly restrict this, so that we only consider a CustomJoin node when we have previously said one might be usable and the user has requested this (e.g. enable_foojoin = on) We only consider merge joins if the join uses operators with oprcanmerge=true. We only consider hash joins if the join uses operators with oprcanhash=true So it seems reasonable to have a way to define/declare what is possible and what is not. But my take is that adding a new column to pg_operator for every CustomJoin node is probably out of the question, hence my suggestion to list the operators we know it can work with. Given that everything else in Postgres is agnostic and configurable, I'm looking to do the same here. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 8 May 2014 20:10, Stephen Frost <sfrost@snowman.net> wrote: >> * A USERSET mechanism to allow users to turn it off for testing or >> otherwise, at user, database level > > If we re-implement our existing components through this ("eat our own > dogfood" as it were), I'm not sure that we'd be able to have a way to > turn it on/off.. I realize we wouldn't have to, but then it seems like > we'd have two very different code paths and likely a different level of > support / capability afforded to "external" storage systems and then I > wonder if we're not back to just FDWs again.. We have SET enable_hashjoin = on | off I would like a way to do the equivalent of SET enable_mycustomjoin = off so that when it starts behaving weirdly in production, I can turn it off so we can prove that is not the casue, or keep it turned off if its a problem. I don't want to have to call a hook and let the hook decide whether it can be turned off or not. Postgres should be in control of the plugin, not give control to the plugin every time and hope it gives us control back. (I'm trying to take the "FDW isn't the right way" line of thinking to its logical conclusions, so we can decide). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > I'm pretty skeptical about this whole line of inquiry. We've only got > three kinds of joins, and each one of them has quite a bit of bespoke > logic, and all of this code is pretty performance-sensitive on large > join nests. If there's a way to make this work for KaiGai's use case > at all, I suspect something really lightweight like a hook, which > should have negligible impact on other workloads, is a better fit than > something involving system catalog access. But I might be wrong. We do a great deal of catalog consultation already during planning, so I think a few more wouldn't be a problem, especially if the planner is smart enough to touch the catalogs just once (per query?) and cache the results. However, your point about lots of bespoke logic is dead on, and I'm afraid it's damn near a fatal objection. As just one example, if we did not have merge joins then an awful lot of what the planner does with path keys simply wouldn't exist, or at least would look a lot different than it does. Without that infrastructure, I can't imagine that a plugin approach would be able to plan mergejoins anywhere near as effectively. Maybe there's a way around this issue, but it sure won't just be a pg_am-like API. > I also think that there are really two separate problems here: getting > the executor to call a custom scan node when it shows up in the plan > tree; and figuring out how to get it into the plan tree in the first > place. I'm not sure we've properly separated those problems, and I'm > not sure into which category the issues that sunk KaiGai's 9.4 patch > fell. I thought that the executor side of his patch wasn't in bad shape. The real problems were in the planner, and indeed largely in the "backend" part of the planner where there's a lot of hard-wired logic for fixing up low-level details of the constructed plan tree. It seems like in principle it might be possible to make that logic cleanly extensible, but it'll likely take a major rewrite. The patch tried to skate by with just exposing a bunch of internal functions, which I don't think is a maintainable approach, either for the core or for the extensions using it. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote: >> For my money, we'd be better off >> getting some kind of basic custom scan node functionality committed >> first, even if the cases where you can actually inject them into real >> plans are highly restricted. Then, we could later work on adding more >> ways to inject them in more places. > We're past the prototyping stage and into productionising what we know > works, AFAIK. If that point is not clear, then we need to discuss that > first. OK, I'll bite: what here do we know works? Not a damn thing AFAICS; it's all speculation that certain hooks might be useful, and speculation that's not supported by a lot of evidence. If you think this isn't prototyping, I wonder what you think *is* prototyping. It seems likely to me that our existing development process is not terribly well suited to developing a good solution in this area. We need to be able to try some things and throw away what doesn't work; but the project's mindset is not conducive to throwing features away once they've appeared in a shipped release. And the other side of the coin is that trying these things is not inexpensive: you have to write some pretty serious code before you have much of a feel for whether a planner hook API is actually any good. So by the time you've built something of the complexity of, say, contrib/postgres_fdw, you don't really want to throw that away in the next major release. And that's at the bottom end of the scale of the amount of work that'd be needed to do anything with the sorts of interfaces we're discussing. So I'm not real sure how we move forward. Maybe something to brainstorm about in Ottawa. regards, tom lane
Simon Riggs <simon@2ndQuadrant.com> writes: > We only consider merge joins if the join uses operators with oprcanmerge=true. > We only consider hash joins if the join uses operators with oprcanhash=true > So it seems reasonable to have a way to define/declare what is > possible and what is not. But my take is that adding a new column to > pg_operator for every CustomJoin node is probably out of the question, > hence my suggestion to list the operators we know it can work with. For what that's worth, I'm not sure that either the oprcanmerge or oprcanhash columns really pull their weight. We could dispense with both at the cost of doing some wasted lookups in pg_amop. (Perhaps we should replace them with a single "oprisequality" column, which would amount to a hint that it's worth looking for hash or merge properties, or for other equality-ish properties in future.) So I think something comparable to an operator class is indeed a better approach than adding more columns to pg_operator. Other than the connection to pg_am, you could pretty nearly just use the operator class infrastructure as-is for a lot of operator-property things like this. regards, tom lane
On 8 May 2014 21:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So I'm not real sure how we move forward. Maybe something to brainstorm > about in Ottawa. I'm just about to go on away for a week, so that's probably the best place to leave (me out of) the discussion until Ottawa. I've requested some evidence this hardware route is worthwhile from my contacts, so we'll see what we get. Presumably Kaigai has something to share already also. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Umm... I'm now missing the direction towards my goal. > What approach is the best way to glue PostgreSQL and PGStrom? I haven't really paid any attention to PGStrom. Perhaps it's just that I missed it, but I would find it useful if you could direct me towards a benchmark or something like that, that demonstrates a representative scenario in which the facilities that PGStrom offers are compelling compared to traditional strategies already implemented in Postgres and other systems. If I wanted to make joins faster, personally, I would look at opportunities to optimize our existing hash joins to take better advantage of modern CPU characteristics. A lot of the research suggests that it may be useful to implement techniques that take better advantage of available memory bandwidth through techniques like prefetching and partitioning, perhaps even (counter-intuitively) at the expense of compute bandwidth. It's possible that it just needs to be explained to me, but, with respect, intuitively I have a hard time imagining that offloading joins to the GPU will help much in the general case. Every paper on joins from the last decade talks a lot about memory bandwidth and memory latency. Are you concerned with some specific case that I may have missed? In what scenario might a cost-based optimizer reasonably prefer a custom join node implemented by PgStrom, over any of the existing join node types? It's entirely possible that I simply missed relevant discussions here. -- Peter Geoghegan
On Thu, May 8, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I thought that the executor side of his patch wasn't in bad shape. The > real problems were in the planner, and indeed largely in the "backend" > part of the planner where there's a lot of hard-wired logic for fixing up > low-level details of the constructed plan tree. It seems like in > principle it might be possible to make that logic cleanly extensible, > but it'll likely take a major rewrite. The patch tried to skate by with > just exposing a bunch of internal functions, which I don't think is a > maintainable approach, either for the core or for the extensions using it. Well, I consider that somewhat good news, because I think it would be rather nice if we could get by with solving one problem at a time, and if the executor part is close to being well-solved, excellent. My ignorance is probably showing here, but I guess I don't understand why it's so hard to deal with the planner side of things. My perhaps-naive impression is that a Seq Scan node, or even an Index Scan node, is not all that complicated. If we just want to inject some more things that behave a lot like those into various baserels, I guess I don't understand why that's especially hard. Now I do understand that part of what KaiGai wants to do here is inject custom scan paths as additional paths for *joinrels*. And I can see why that would be somewhat more complicated. But I also don't see why that's got to be part of the initial commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > I also think that there are really two separate problems here: getting > > the executor to call a custom scan node when it shows up in the plan > > tree; and figuring out how to get it into the plan tree in the first > > place. I'm not sure we've properly separated those problems, and I'm > > not sure into which category the issues that sunk KaiGai's 9.4 patch > > fell. > > I thought that the executor side of his patch wasn't in bad shape. The > real problems were in the planner, and indeed largely in the "backend" > part of the planner where there's a lot of hard-wired logic for fixing up > low-level details of the constructed plan tree. It seems like in principle > it might be possible to make that logic cleanly extensible, but it'll likely > take a major rewrite. The patch tried to skate by with just exposing a > bunch of internal functions, which I don't think is a maintainable approach, > either for the core or for the extensions using it. > (I'm now trying to catch up the discussion last night...) I initially intended to allow extensions to add their custom-path based on their arbitrary decision, because the core backend cannot have expectation towards the behavior of custom-plan. However, of course, the custom-path that replaces built-in paths shall have compatible behavior in spite of different implementation. So, I'm inclined to the direction that custom-plan provider will inform the core backend what they can do, and planner will give extensions more practical information to construct custom path node. Let me investigate how to handle join replacement by custom-path in the planner stage. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> On Thu, May 8, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I thought that the executor side of his patch wasn't in bad shape. > > The real problems were in the planner, and indeed largely in the "backend" > > part of the planner where there's a lot of hard-wired logic for fixing > > up low-level details of the constructed plan tree. It seems like in > > principle it might be possible to make that logic cleanly extensible, > > but it'll likely take a major rewrite. The patch tried to skate by > > with just exposing a bunch of internal functions, which I don't think > > is a maintainable approach, either for the core or for the extensions > using it. > > Well, I consider that somewhat good news, because I think it would be rather > nice if we could get by with solving one problem at a time, and if the executor > part is close to being well-solved, excellent. > > My ignorance is probably showing here, but I guess I don't understand why > it's so hard to deal with the planner side of things. My perhaps-naive > impression is that a Seq Scan node, or even an Index Scan node, is not all > that complicated. If we just want to inject some more things that behave > a lot like those into various baserels, I guess I don't understand why that's > especially hard. > > Now I do understand that part of what KaiGai wants to do here is inject > custom scan paths as additional paths for *joinrels*. And I can see why > that would be somewhat more complicated. But I also don't see why that's > got to be part of the initial commit. > I'd also like to take this approach. Even though we eventually need to take a graceful approach for join replacement by custom-path, it partially makes sense to have minimum functionality set first. Then, we can focus on how to design planner integration for joinning. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Umm... I'm now missing the direction towards my goal. > > What approach is the best way to glue PostgreSQL and PGStrom? > > I haven't really paid any attention to PGStrom. Perhaps it's just that I > missed it, but I would find it useful if you could direct me towards a > benchmark or something like that, that demonstrates a representative > scenario in which the facilities that PGStrom offers are compelling compared > to traditional strategies already implemented in Postgres and other > systems. > Implementation of Hash-Join on GPU side is still under development. Only available use-case right now is an alternative scan path towards full table scan in case when a table contains massive amount of records and qualifiers are enough complicated. EXPLAIN command below is, a sequential scan towards a table that contains 80M records (all of them are on memory; no disk accesses during execution). Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at least. postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=10003175757.67..10003175757.68 rows=1 width=0) (actual time=46648.635..46648.635 rows=1 loops=1) -> Seq Scan on t1 (cost=10000000000.00..10003109091.00 rows=26666667 width=0) (actual time=0.047..46351.567 rows=2513814 loops=1) Filter: (sqrt((((x - 20::double precision) ^ 2::double precision) + ((y - 20::double precision) ^ 2::double precision)))< 10::double precision) Rows Removed by Filter: 77486186Planning time: 0.066 msTotal runtime: 46648.668ms (6 rows) postgres=# set pg_strom.enabled = on; SET postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + (y-20.0)^2) < 10; QUERY PLAN -----------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate (cost=1274424.33..1274424.34 rows=1 width=0) (actual time=1784.729..1784.729 rows=1 loops=1) -> Custom (GpuScan) on t1 (cost=10000.00..1207757.67 rows=26666667 width=0) (actual time=179.748..1567.018 rows=2513699 loops=1) Host References: Device References: x, y Device Filter: (sqrt((((x - 20::double precision) ^ 2::double precision)+ ((y - 20::double precision) ^ 2::double precision))) < 10::double precision) Total time to load: 0.231ms Avg time in send-mq: 0.027 ms Max time to build kernel: 1.064 ms Avg time of DMA send: 3.050ms Total time of DMA send: 933.318 ms Avg time of kernel exec: 5.117 ms Total time of kernel exec:1565.799 ms Avg time of DMA recv: 0.086 ms Total time of DMA recv: 26.289 ms Avg time in recv-mq:0.011 msPlanning time: 0.094 msTotal runtime: 1784.793 ms (17 rows) > If I wanted to make joins faster, personally, I would look at opportunities > to optimize our existing hash joins to take better advantage of modern CPU > characteristics. A lot of the research suggests that it may be useful to > implement techniques that take better advantage of available memory > bandwidth through techniques like prefetching and partitioning, perhaps > even (counter-intuitively) at the expense of compute bandwidth. It's > possible that it just needs to be explained to me, but, with respect, > intuitively I have a hard time imagining that offloading joins to the GPU > will help much in the general case. Every paper on joins from the last decade > talks a lot about memory bandwidth and memory latency. Are you concerned > with some specific case that I may have missed? In what scenario might a > cost-based optimizer reasonably prefer a custom join node implemented by > PgStrom, over any of the existing join node types? It's entirely possible > that I simply missed relevant discussions here. > If our purpose is to consume 100% capacity of GPU device, memory bandwidth is troublesome. But I'm not interested in GPU benchmarking. Things I want to do is, accelerate complicated query processing than existing RDBMS, with cheap in cost and transparent to existing application approach. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
* Simon Riggs (simon@2ndQuadrant.com) wrote: > On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote: > > For my money, we'd be better off > > getting some kind of basic custom scan node functionality committed > > first, even if the cases where you can actually inject them into real > > plans are highly restricted. Then, we could later work on adding more > > ways to inject them in more places. > > We're past the prototyping stage and into productionising what we know > works, AFAIK. If that point is not clear, then we need to discuss that > first. > > At the moment the Custom join hook is called every time we attempt to > cost a join, with no restriction. > > I would like to highly restrict this, so that we only consider a > CustomJoin node when we have previously said one might be usable and > the user has requested this (e.g. enable_foojoin = on) This is part of what I disagree with- I'd rather not require users to know and understand when they want to do a HashJoin vs. a MergeJoin vs. a CustomJoinTypeX. > We only consider merge joins if the join uses operators with oprcanmerge=true. > We only consider hash joins if the join uses operators with oprcanhash=true I wouldn't consider those generally "user-facing" options, and the enable_X counterparts are intended for debugging and not to be used in production environments. To the point you make in the other thread- I'm fine w/ having similar cost-based enable_X options, but I think we're doing our users a disservice if we require that they populate or update a table. Perhaps an extension needs to do that on installation, but that would need to enable everything to avoid the user having to mess around with the table. > So it seems reasonable to have a way to define/declare what is > possible and what is not. But my take is that adding a new column to > pg_operator for every CustomJoin node is probably out of the question, > hence my suggestion to list the operators we know it can work with. It does seem like there should be some work done in this area, as Tom mentioned, to avoid needing to have more columns to track how equality can be done. I do wonder just how we'd deal with this when it comes to GPUs as, presumably, the code to implement the equality for various types would have to be written in CUDA-or-whatever. > Given that everything else in Postgres is agnostic and configurable, > I'm looking to do the same here. It's certainly a neat idea, but I do have concerns (which appear to be shared by others) about just how practical it'll be and how much rework it'd take and the question about if it'd really be used in the end.. Thanks, Stephen
> > So it seems reasonable to have a way to define/declare what is > > possible and what is not. But my take is that adding a new column to > > pg_operator for every CustomJoin node is probably out of the question, > > hence my suggestion to list the operators we know it can work with. > > It does seem like there should be some work done in this area, as Tom mentioned, > to avoid needing to have more columns to track how equality can be done. > I do wonder just how we'd deal with this when it comes to GPUs as, presumably, > the code to implement the equality for various types would have to be written > in CUDA-or-whatever. > GPU has workloads likes and dislikes. It is a reasonable idea to list up operators (or something else) that have advantage to run on custom-path. For example, numeric calculation on fixed-length variables has greate advantage on GPU, but locale aware text matching is not a workload suitable to GPUs. It may be a good hint for planner to pick up candidate paths to be considered. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
* Peter Geoghegan (pg@heroku.com) wrote: > On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Umm... I'm now missing the direction towards my goal. > > What approach is the best way to glue PostgreSQL and PGStrom? > > I haven't really paid any attention to PGStrom. Perhaps it's just that > I missed it, but I would find it useful if you could direct me towards > a benchmark or something like that, that demonstrates a representative > scenario in which the facilities that PGStrom offers are compelling > compared to traditional strategies already implemented in Postgres and > other systems. I agree that some concrete evidence would be really nice. I more-or-less took KaiGai's word on it, but having actual benchmarks would certainly be better. > If I wanted to make joins faster, personally, I would look at > opportunities to optimize our existing hash joins to take better > advantage of modern CPU characteristics. Yeah, I'm pretty confident we're leaving a fair bit on the table right there based on my previous investigation into this area. There were easily cases which showed a 3x improvement, as I recall (the trade-off being increased memory usage for a larger, sparser hash table). Sadly, there were also cases which ended up being worse and it seemed to be very sensetive to the size of the hash table which ends up being built and the size of the on-CPU cache. > A lot of the research > suggests that it may be useful to implement techniques that take > better advantage of available memory bandwidth through techniques like > prefetching and partitioning, perhaps even (counter-intuitively) at > the expense of compute bandwidth. While I agree with this, one of the big things about GPUs is that they operate in a highly parallel fashion and across a different CPU/Memory architecture than what we're used to (for starters, everything is much "closer"). In a traditional memory system, there's a lot of back and forth to memory, but a single memory dump over to the GPU's memory where everything is processed in a highly parallel way and then shipped back wholesale to main memory is at least conceivably faster. Of course, things will change when we are able to parallelize joins across multiple CPUs ourselves.. In a way, the PGStrom approach gets to "cheat" us today, since it can parallelize the work where core can't and that ends up not being an entirely fair comparison. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > Well, I consider that somewhat good news, because I think it would be > rather nice if we could get by with solving one problem at a time, and > if the executor part is close to being well-solved, excellent. Sadly, I'm afraid the news really isn't all that good in the end.. > My ignorance is probably showing here, but I guess I don't understand > why it's so hard to deal with the planner side of things. My > perhaps-naive impression is that a Seq Scan node, or even an Index > Scan node, is not all that complicated. If we just want to inject > some more things that behave a lot like those into various baserels, I > guess I don't understand why that's especially hard. That's not what is being asked for here though... > Now I do understand that part of what KaiGai wants to do here is > inject custom scan paths as additional paths for *joinrels*. And I > can see why that would be somewhat more complicated. But I also don't > see why that's got to be part of the initial commit. I'd say it's more than "part" of what the goal is here- it's more or less what everything boils down to. Oh, plus being able to replace aggregates with a GPU-based operation instead, but that's no trivially done thing either really (if it is, let's get it done for FDWs already...). Thanks, Stephen
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > I initially intended to allow extensions to add their custom-path based > on their arbitrary decision, because the core backend cannot have > expectation towards the behavior of custom-plan. > However, of course, the custom-path that replaces built-in paths shall > have compatible behavior in spite of different implementation. I didn't ask this before but it's been on my mind for a while- how will this work for custom data types, ala the 'geometry' type from PostGIS? There's user-provided code that we have to execute to check equality for those, but they're not giving us CUDA code to run to perform that equality... Thanks, Stephen
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > I initially intended to allow extensions to add their custom-path > > based on their arbitrary decision, because the core backend cannot > > have expectation towards the behavior of custom-plan. > > However, of course, the custom-path that replaces built-in paths shall > > have compatible behavior in spite of different implementation. > > I didn't ask this before but it's been on my mind for a while- how will > this work for custom data types, ala the 'geometry' type from PostGIS? > There's user-provided code that we have to execute to check equality for > those, but they're not giving us CUDA code to run to perform that equality... > If custom-plan provider support the user-defined data types such as PostGIS, it will be able to pick up these data types also, in addition to built-in ones. It fully depends on coverage of the extension. If not a supported data type, it is not a show-time of GPUs. In my case, if PG-Strom can also have compatible code, but runnable on OpenCL, of them, it will say "yes, I can handle this data type". Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > GPU has workloads likes and dislikes. It is a reasonable idea to list up > operators (or something else) that have advantage to run on custom-path. > For example, numeric calculation on fixed-length variables has greate > advantage on GPU, but locale aware text matching is not a workload suitable > to GPUs. Right- but this points out exactly what I was trying to bring up. Locale-aware text matching requires running libc-provided code, which isn't going to happen on the GPU (unless we re-implement it...). Aren't we going to have the same problem with the 'numeric' type? Our existing functions won't be usable on the GPU and we'd have to re-implement them and then make darn sure that they produce the same results... We'll also have to worry about any cases where we have a libc function and a CUDA function and convince ourselves that there's no difference between the two.. Not sure exactly how we'd built this kind of knowledge into the system through a catalog (I tend to doubt that'd work, in fact) and trying to make it work from an extension in a way that it would work with *other* extensions strikes me as highly unlikely. Perhaps the extension could provide the core types and the other extensions could provide their own bits to hook into the right places, but that sure seems fragile. Thanks, Stephen
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > I didn't ask this before but it's been on my mind for a while- how will > > this work for custom data types, ala the 'geometry' type from PostGIS? > > There's user-provided code that we have to execute to check equality for > > those, but they're not giving us CUDA code to run to perform that equality... > > > If custom-plan provider support the user-defined data types such as PostGIS, > it will be able to pick up these data types also, in addition to built-in > ones. It fully depends on coverage of the extension. > If not a supported data type, it is not a show-time of GPUs. So the extension will need to be aware of all custom data types and then installed *after* all other extensions are installed? That doesn't strike me as workable... Thanks, Stephen
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > > I didn't ask this before but it's been on my mind for a while- how > > > will this work for custom data types, ala the 'geometry' type from > PostGIS? > > > There's user-provided code that we have to execute to check equality > > > for those, but they're not giving us CUDA code to run to perform that > equality... > > > > > If custom-plan provider support the user-defined data types such as > > PostGIS, it will be able to pick up these data types also, in addition > > to built-in ones. It fully depends on coverage of the extension. > > If not a supported data type, it is not a show-time of GPUs. > > So the extension will need to be aware of all custom data types and then > installed *after* all other extensions are installed? That doesn't strike > me as workable... > I'm not certain why do you think an extension will need to support all the data types. Even if it works only for a particular set of data types, it makes sense as long as it covers data types user actually using. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, May 8, 2014 at 7:13 PM, Stephen Frost <sfrost@snowman.net> wrote: > Of course, things will change when we are able to parallelize joins > across multiple CPUs ourselves.. In a way, the PGStrom approach gets to > "cheat" us today, since it can parallelize the work where core can't and > that ends up not being an entirely fair comparison. I was thinking of SIMD, along similar lines. We might be able to cheat our way out of having to solve some of the difficult problems of parallelism that way. For example, if you can build a SIMD-friendly bitonic mergesort, and combine that with poor man's normalized keys, that could make merge joins on text faster. That's pure speculation, but it seems like an interesting possibility. -- Peter Geoghegan
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > So the extension will need to be aware of all custom data types and then > > installed *after* all other extensions are installed? That doesn't strike > > me as workable... > > > I'm not certain why do you think an extension will need to support all > the data types. Mostly because we have a very nice extension system which quite a few different extensions make use of and it'd be pretty darn unfortunate if none of them could take advtange of GPUs because we decided that the right way to support GPUs was through an extension. This is argument which might be familiar to some as it was part of the reason that json and jsonb were added to core, imv... > Even if it works only for a particular set of data types, it makes sense > as long as it covers data types user actually using. I know quite a few users of PostGIS, ip4r, and hstore... Thanks, Stephen
On Thu, May 8, 2014 at 10:16 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Well, I consider that somewhat good news, because I think it would be >> rather nice if we could get by with solving one problem at a time, and >> if the executor part is close to being well-solved, excellent. > > Sadly, I'm afraid the news really isn't all that good in the end.. > >> My ignorance is probably showing here, but I guess I don't understand >> why it's so hard to deal with the planner side of things. My >> perhaps-naive impression is that a Seq Scan node, or even an Index >> Scan node, is not all that complicated. If we just want to inject >> some more things that behave a lot like those into various baserels, I >> guess I don't understand why that's especially hard. > > That's not what is being asked for here though... I am not sure what your point is here. Here's mine: if we can strip this down to the executor support plus the most minimal planner support possible, we might be able to get *something* committed. Then we can extend it in subsequent commits. You seem to be saying there's no value in getting anything committed unless it handles the scan-substituting-for-join case. I don't agree.Incremental commits are good, whether they get youall the way to where you want to be or not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > I am not sure what your point is here. Here's mine: if we can strip > this down to the executor support plus the most minimal planner > support possible, we might be able to get *something* committed. Then > we can extend it in subsequent commits. I guess my point is that I see this more-or-less being solved already by FDWs, but that doesn't address the case when it's a local table, so perhaps there is something useful our of a commit that allows replacement of a SeqScan node (which presumably would also be costed differently). > You seem to be saying there's no value in getting anything committed > unless it handles the scan-substituting-for-join case. I don't agree. > Incremental commits are good, whether they get you all the way to > where you want to be or not. To be honest, I think this is really the first proposal to replace specific Nodes, rather than provide a way for a generic Node to exist (which could also replace joins). While I do think it's an interesting idea, and if we could push filters down to this new Node it might even be worthwhile, I'm not sure that it actually moves us down the path to supporting Nodes which replace joins. Still, I'm not against it. Thanks, Stephen
On 9 May 2014 02:40, Stephen Frost <sfrost@snowman.net> wrote: > * Simon Riggs (simon@2ndQuadrant.com) wrote: >> On 8 May 2014 20:40, Robert Haas <robertmhaas@gmail.com> wrote: >> > For my money, we'd be better off >> > getting some kind of basic custom scan node functionality committed >> > first, even if the cases where you can actually inject them into real >> > plans are highly restricted. Then, we could later work on adding more >> > ways to inject them in more places. >> >> We're past the prototyping stage and into productionising what we know >> works, AFAIK. If that point is not clear, then we need to discuss that >> first. >> >> At the moment the Custom join hook is called every time we attempt to >> cost a join, with no restriction. >> >> I would like to highly restrict this, so that we only consider a >> CustomJoin node when we have previously said one might be usable and >> the user has requested this (e.g. enable_foojoin = on) > > This is part of what I disagree with- I'd rather not require users to > know and understand when they want to do a HashJoin vs. a MergeJoin vs. > a CustomJoinTypeX. Again, I have *not* said users should know that. >> We only consider merge joins if the join uses operators with oprcanmerge=true. >> We only consider hash joins if the join uses operators with oprcanhash=true > > I wouldn't consider those generally "user-facing" options, and the > enable_X counterparts are intended for debugging and not to be used in > production environments. To the point you make in the other thread- I'm > fine w/ having similar cost-based enable_X options, but I think we're > doing our users a disservice if we require that they populate or update > a table. Perhaps an extension needs to do that on installation, but > that would need to enable everything to avoid the user having to mess > around with the table. Again, I did *not* say those should be user facing options, nor that they be set at table-level. What I have said is that authors of CustomJoins or CustomScans should declare in advance via system catalogs which operators their new code works with so that Postgres knows when it is appropriate to call them. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > What I have said is that authors of CustomJoins or CustomScans should > declare in advance via system catalogs which operators their new code > works with so that Postgres knows when it is appropriate to call them. I guess I just took that as given, since the discussion has been about GPUs and there will have to be new operators since there will be different code (CUDA-or-whatever GPU-language code). Thanks, Stephen
On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We're past the prototyping stage and into productionising what we know >> works, AFAIK. If that point is not clear, then we need to discuss that >> first. > > OK, I'll bite: what here do we know works? Not a damn thing AFAICS; > it's all speculation that certain hooks might be useful, and speculation > that's not supported by a lot of evidence. If you think this isn't > prototyping, I wonder what you think *is* prototyping. My research contacts advise me of this recent work http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf and also that they expect a prototype to be ready by October, which I have been told will be open source. So there are at least two groups looking at this as a serious option for Postgres (not including the above paper's authors). That isn't *now*, but it is at least a time scale that fits with acting on this in the next release, if we can separate out the various ideas and agree we wish to proceed. I'll submerge again... -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> We're past the prototyping stage and into productionising what we > >> know works, AFAIK. If that point is not clear, then we need to > >> discuss that first. > > > > OK, I'll bite: what here do we know works? Not a damn thing AFAICS; > > it's all speculation that certain hooks might be useful, and > > speculation that's not supported by a lot of evidence. If you think > > this isn't prototyping, I wonder what you think *is* prototyping. > > My research contacts advise me of this recent work > http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf > and also that they expect a prototype to be ready by October, which I have > been told will be open source. > > So there are at least two groups looking at this as a serious option for > Postgres (not including the above paper's authors). > > That isn't *now*, but it is at least a time scale that fits with acting > on this in the next release, if we can separate out the various ideas and > agree we wish to proceed. > > I'll submerge again... > Through the discussion last week, our minimum consensus are: 1. Deregulated enhancement of FDW is not a way to go 2. Custom-path that can replace built-in scan makes sense as a first step towards the future enhancement. Its planner integrationis enough simple to do. 3. Custom-path that can replace built-in join takes investigation how to integrate existing planner structure, to avoid(3a) reinvention of whole of join handling in extension side, and (3b) unnecessary extension calls towards the caseobviously unsupported. So, I'd like to start the (2) portion towards the upcoming 1st commit-fest on the v9.5 development cycle. Also, we will be able to have discussion for the (3) portion concurrently, probably, towards 2nd commit-fest. Unfortunately, I cannot participate PGcon/Ottawa this year. Please share us the face-to-face discussion here. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
According to the discussion upthread, I revised the custom-plan patch to focus on regular relation scan but no join support right now, and to support DDL command to define custom-plan providers. Planner integration with custom logic to scan a particular relation is enough simple, unlike various join cases. It's almost similar to what built-in logic are doing now - custom-plan provider adds a path node with its cost estimation if it can offer alternative way to scan referenced relation. (in case of no idea, it does not need to add any paths) A new DDL syntax I'd like to propose is below: CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; <name> is as literal, put a unique identifier. <class> is workload type to be offered by this custom-plan provider. "scan" is the only option right now, that means base relation scan. <function_name> is also as literal; it shall perform custom-plan provider. A custom-plan provider function is assumed to take an argument of "internal" type to deliver a set of planner information that is needed to construct custom-plan pathnode. In case of "scan" class, pointer towards an customScanArg object shall be delivered on invocation of custom-plan provider. typedef struct { uint32 custom_class; PlannerInfo *root; RelOptInfo *baserel; RangeTblEntry *rte; } customScanArg; In case when the custom-plan provider function being invoked thought it can offer an alternative scan path on the relation of "baserel", things to do is (1) construct a CustomPath (or its inherited data type) object with a table of callback function pointers (2) put its own cost estimation, and (3) call add_path() to register this path as an alternative one. Once the custom-path was chosen by query planner, its CreateCustomPlan callback is called to populate CustomPlan node based on the pathnode. It also has a table of callback function pointers to handle various planner's job in setrefs.c and so on. Similarly, its CreateCustomPlanState callback is called to populate CustomPlanState node based on the plannode. It also has a table of callback function pointers to handle various executor's job during quey execution. Most of callback designs are not changed from the prior proposition in v9.4 development cycle, however, here is a few changes. * CustomPlan became to inherit Scan, and CustomPlanState became to inherit ScanState. Because some useful routines to implement scan- logic, like ExecScan, expects state-node has ScanState as its base type, it's more kindness for extension side. (I'd like to avoid each extension reinvent ExecScan by copy & paste!) I'm not sure whether it should be a union of Join in the future, however, it is a reasonable choice to have compatible layout with Scan/ScanState to implement alternative "scan" logic. * Exporting static functions - I still don't have a graceful answer here. However, it is quite natural that extensions to follow up interface updates on the future version up of PostgreSQL. Probably, it shall become clear what class of functions shall be exported and what class of functions shall be re-implemented within extension side in the later discussion. Right now, I exported minimum ones that are needed to implement alternative scan method - contrib/ctidscan module. Items to be discussed later: * planner integration for relations join - probably, we may define new custom-plan classes as alternative of hash-join, merge-join and nest-loop. If core can know this custom-plan is alternative of hash- join, we can utilize core code to check legality of join. * generic key-value style options in custom-plan definition - Hanada san proposed me off-list - like foreign data wrapper. It may enable to configure multiple behavior on a binary. * ownership and access control of custom-plan. right now, only superuser can create/drop custom-plan provider definition, thus, it has no explicit ownership and access control. It seems to me a reasonable assumption, however, we may have a usecase that needs custom-plan by unprivileged users. Thanks, 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> >> We're past the prototyping stage and into productionising what we >> >> know works, AFAIK. If that point is not clear, then we need to >> >> discuss that first. >> > >> > OK, I'll bite: what here do we know works? Not a damn thing AFAICS; >> > it's all speculation that certain hooks might be useful, and >> > speculation that's not supported by a lot of evidence. If you think >> > this isn't prototyping, I wonder what you think *is* prototyping. >> >> My research contacts advise me of this recent work >> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf >> and also that they expect a prototype to be ready by October, which I have >> been told will be open source. >> >> So there are at least two groups looking at this as a serious option for >> Postgres (not including the above paper's authors). >> >> That isn't *now*, but it is at least a time scale that fits with acting >> on this in the next release, if we can separate out the various ideas and >> agree we wish to proceed. >> >> I'll submerge again... >> > Through the discussion last week, our minimum consensus are: > 1. Deregulated enhancement of FDW is not a way to go > 2. Custom-path that can replace built-in scan makes sense as a first step > towards the future enhancement. Its planner integration is enough simple > to do. > 3. Custom-path that can replace built-in join takes investigation how to > integrate existing planner structure, to avoid (3a) reinvention of > whole of join handling in extension side, and (3b) unnecessary extension > calls towards the case obviously unsupported. > > So, I'd like to start the (2) portion towards the upcoming 1st commit-fest > on the v9.5 development cycle. Also, we will be able to have discussion > for the (3) portion concurrently, probably, towards 2nd commit-fest. > > Unfortunately, I cannot participate PGcon/Ottawa this year. Please share > us the face-to-face discussion here. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Kaigai-san, I've just applied v1 patch, and tried build and install, but I found two issues: 1) The contrib/ctidscan is not automatically built/installed because it's not described in contrib/Makefile. Is this expected behavior? 2) I got an error message below when building document. $ cd doc/src/sgml $ make openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml openjade:catalogs.sgml:2525:45:X: reference to non-existent ID "SQL-CREATECUSTOMPLAN" make: *** [HTML.index] Error 1 make: *** Deleting file `HTML.index' I'll review another part of the patch, including the design. 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > According to the discussion upthread, I revised the custom-plan patch > to focus on regular relation scan but no join support right now, and to > support DDL command to define custom-plan providers. > > Planner integration with custom logic to scan a particular relation is > enough simple, unlike various join cases. It's almost similar to what > built-in logic are doing now - custom-plan provider adds a path node > with its cost estimation if it can offer alternative way to scan referenced > relation. (in case of no idea, it does not need to add any paths) > > A new DDL syntax I'd like to propose is below: > > CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; > > <name> is as literal, put a unique identifier. > <class> is workload type to be offered by this custom-plan provider. > "scan" is the only option right now, that means base relation scan. > <function_name> is also as literal; it shall perform custom-plan provider. > > A custom-plan provider function is assumed to take an argument of > "internal" type to deliver a set of planner information that is needed to > construct custom-plan pathnode. > In case of "scan" class, pointer towards an customScanArg object > shall be delivered on invocation of custom-plan provider. > > typedef struct { > uint32 custom_class; > PlannerInfo *root; > RelOptInfo *baserel; > RangeTblEntry *rte; > } customScanArg; > > In case when the custom-plan provider function being invoked thought > it can offer an alternative scan path on the relation of "baserel", things > to do is (1) construct a CustomPath (or its inherited data type) object > with a table of callback function pointers (2) put its own cost estimation, > and (3) call add_path() to register this path as an alternative one. > > Once the custom-path was chosen by query planner, its CreateCustomPlan > callback is called to populate CustomPlan node based on the pathnode. > It also has a table of callback function pointers to handle various planner's > job in setrefs.c and so on. > > Similarly, its CreateCustomPlanState callback is called to populate > CustomPlanState node based on the plannode. It also has a table of > callback function pointers to handle various executor's job during quey > execution. > > Most of callback designs are not changed from the prior proposition in > v9.4 development cycle, however, here is a few changes. > > * CustomPlan became to inherit Scan, and CustomPlanState became to > inherit ScanState. Because some useful routines to implement scan- > logic, like ExecScan, expects state-node has ScanState as its base > type, it's more kindness for extension side. (I'd like to avoid each > extension reinvent ExecScan by copy & paste!) > I'm not sure whether it should be a union of Join in the future, however, > it is a reasonable choice to have compatible layout with Scan/ScanState > to implement alternative "scan" logic. > > * Exporting static functions - I still don't have a graceful answer here. > However, it is quite natural that extensions to follow up interface updates > on the future version up of PostgreSQL. > Probably, it shall become clear what class of functions shall be > exported and what class of functions shall be re-implemented within > extension side in the later discussion. > Right now, I exported minimum ones that are needed to implement > alternative scan method - contrib/ctidscan module. > > Items to be discussed later: > * planner integration for relations join - probably, we may define new > custom-plan classes as alternative of hash-join, merge-join and > nest-loop. If core can know this custom-plan is alternative of hash- > join, we can utilize core code to check legality of join. > * generic key-value style options in custom-plan definition - Hanada > san proposed me off-list - like foreign data wrapper. It may enable > to configure multiple behavior on a binary. > * ownership and access control of custom-plan. right now, only > superuser can create/drop custom-plan provider definition, thus, > it has no explicit ownership and access control. It seems to me > a reasonable assumption, however, we may have a usecase that > needs custom-plan by unprivileged users. > > Thanks, > > 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> >> We're past the prototyping stage and into productionising what we >>> >> know works, AFAIK. If that point is not clear, then we need to >>> >> discuss that first. >>> > >>> > OK, I'll bite: what here do we know works? Not a damn thing AFAICS; >>> > it's all speculation that certain hooks might be useful, and >>> > speculation that's not supported by a lot of evidence. If you think >>> > this isn't prototyping, I wonder what you think *is* prototyping. >>> >>> My research contacts advise me of this recent work >>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf >>> and also that they expect a prototype to be ready by October, which I have >>> been told will be open source. >>> >>> So there are at least two groups looking at this as a serious option for >>> Postgres (not including the above paper's authors). >>> >>> That isn't *now*, but it is at least a time scale that fits with acting >>> on this in the next release, if we can separate out the various ideas and >>> agree we wish to proceed. >>> >>> I'll submerge again... >>> >> Through the discussion last week, our minimum consensus are: >> 1. Deregulated enhancement of FDW is not a way to go >> 2. Custom-path that can replace built-in scan makes sense as a first step >> towards the future enhancement. Its planner integration is enough simple >> to do. >> 3. Custom-path that can replace built-in join takes investigation how to >> integrate existing planner structure, to avoid (3a) reinvention of >> whole of join handling in extension side, and (3b) unnecessary extension >> calls towards the case obviously unsupported. >> >> So, I'd like to start the (2) portion towards the upcoming 1st commit-fest >> on the v9.5 development cycle. Also, we will be able to have discussion >> for the (3) portion concurrently, probably, towards 2nd commit-fest. >> >> Unfortunately, I cannot participate PGcon/Ottawa this year. Please share >> us the face-to-face discussion here. >> >> Thanks, >> -- >> NEC OSS Promotion Center / PG-Strom Project >> KaiGai Kohei <kaigai@ak.jp.nec.com> >> > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- Shigeru HANADA
Hanada-san, Thanks for your checks. I oversight the points when I submit the patch, sorry. The attached one is revised one on documentation stuff and contrib/Makefile. Thanks, 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: > Kaigai-san, > > I've just applied v1 patch, and tried build and install, but I found two issues: > > 1) The contrib/ctidscan is not automatically built/installed because > it's not described in contrib/Makefile. Is this expected behavior? > 2) I got an error message below when building document. > > $ cd doc/src/sgml > $ make > openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . > -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml > openjade:catalogs.sgml:2525:45:X: reference to non-existent ID > "SQL-CREATECUSTOMPLAN" > make: *** [HTML.index] Error 1 > make: *** Deleting file `HTML.index' > > I'll review another part of the patch, including the design. > > > 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> According to the discussion upthread, I revised the custom-plan patch >> to focus on regular relation scan but no join support right now, and to >> support DDL command to define custom-plan providers. >> >> Planner integration with custom logic to scan a particular relation is >> enough simple, unlike various join cases. It's almost similar to what >> built-in logic are doing now - custom-plan provider adds a path node >> with its cost estimation if it can offer alternative way to scan referenced >> relation. (in case of no idea, it does not need to add any paths) >> >> A new DDL syntax I'd like to propose is below: >> >> CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; >> >> <name> is as literal, put a unique identifier. >> <class> is workload type to be offered by this custom-plan provider. >> "scan" is the only option right now, that means base relation scan. >> <function_name> is also as literal; it shall perform custom-plan provider. >> >> A custom-plan provider function is assumed to take an argument of >> "internal" type to deliver a set of planner information that is needed to >> construct custom-plan pathnode. >> In case of "scan" class, pointer towards an customScanArg object >> shall be delivered on invocation of custom-plan provider. >> >> typedef struct { >> uint32 custom_class; >> PlannerInfo *root; >> RelOptInfo *baserel; >> RangeTblEntry *rte; >> } customScanArg; >> >> In case when the custom-plan provider function being invoked thought >> it can offer an alternative scan path on the relation of "baserel", things >> to do is (1) construct a CustomPath (or its inherited data type) object >> with a table of callback function pointers (2) put its own cost estimation, >> and (3) call add_path() to register this path as an alternative one. >> >> Once the custom-path was chosen by query planner, its CreateCustomPlan >> callback is called to populate CustomPlan node based on the pathnode. >> It also has a table of callback function pointers to handle various planner's >> job in setrefs.c and so on. >> >> Similarly, its CreateCustomPlanState callback is called to populate >> CustomPlanState node based on the plannode. It also has a table of >> callback function pointers to handle various executor's job during quey >> execution. >> >> Most of callback designs are not changed from the prior proposition in >> v9.4 development cycle, however, here is a few changes. >> >> * CustomPlan became to inherit Scan, and CustomPlanState became to >> inherit ScanState. Because some useful routines to implement scan- >> logic, like ExecScan, expects state-node has ScanState as its base >> type, it's more kindness for extension side. (I'd like to avoid each >> extension reinvent ExecScan by copy & paste!) >> I'm not sure whether it should be a union of Join in the future, however, >> it is a reasonable choice to have compatible layout with Scan/ScanState >> to implement alternative "scan" logic. >> >> * Exporting static functions - I still don't have a graceful answer here. >> However, it is quite natural that extensions to follow up interface updates >> on the future version up of PostgreSQL. >> Probably, it shall become clear what class of functions shall be >> exported and what class of functions shall be re-implemented within >> extension side in the later discussion. >> Right now, I exported minimum ones that are needed to implement >> alternative scan method - contrib/ctidscan module. >> >> Items to be discussed later: >> * planner integration for relations join - probably, we may define new >> custom-plan classes as alternative of hash-join, merge-join and >> nest-loop. If core can know this custom-plan is alternative of hash- >> join, we can utilize core code to check legality of join. >> * generic key-value style options in custom-plan definition - Hanada >> san proposed me off-list - like foreign data wrapper. It may enable >> to configure multiple behavior on a binary. >> * ownership and access control of custom-plan. right now, only >> superuser can create/drop custom-plan provider definition, thus, >> it has no explicit ownership and access control. It seems to me >> a reasonable assumption, however, we may have a usecase that >> needs custom-plan by unprivileged users. >> >> Thanks, >> >> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> >> We're past the prototyping stage and into productionising what we >>>> >> know works, AFAIK. If that point is not clear, then we need to >>>> >> discuss that first. >>>> > >>>> > OK, I'll bite: what here do we know works? Not a damn thing AFAICS; >>>> > it's all speculation that certain hooks might be useful, and >>>> > speculation that's not supported by a lot of evidence. If you think >>>> > this isn't prototyping, I wonder what you think *is* prototyping. >>>> >>>> My research contacts advise me of this recent work >>>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf >>>> and also that they expect a prototype to be ready by October, which I have >>>> been told will be open source. >>>> >>>> So there are at least two groups looking at this as a serious option for >>>> Postgres (not including the above paper's authors). >>>> >>>> That isn't *now*, but it is at least a time scale that fits with acting >>>> on this in the next release, if we can separate out the various ideas and >>>> agree we wish to proceed. >>>> >>>> I'll submerge again... >>>> >>> Through the discussion last week, our minimum consensus are: >>> 1. Deregulated enhancement of FDW is not a way to go >>> 2. Custom-path that can replace built-in scan makes sense as a first step >>> towards the future enhancement. Its planner integration is enough simple >>> to do. >>> 3. Custom-path that can replace built-in join takes investigation how to >>> integrate existing planner structure, to avoid (3a) reinvention of >>> whole of join handling in extension side, and (3b) unnecessary extension >>> calls towards the case obviously unsupported. >>> >>> So, I'd like to start the (2) portion towards the upcoming 1st commit-fest >>> on the v9.5 development cycle. Also, we will be able to have discussion >>> for the (3) portion concurrently, probably, towards 2nd commit-fest. >>> >>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please share >>> us the face-to-face discussion here. >>> >>> Thanks, >>> -- >>> NEC OSS Promotion Center / PG-Strom Project >>> KaiGai Kohei <kaigai@ak.jp.nec.com> >>> >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > -- > Shigeru HANADA -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Kaigai-san, Sorry for lagged response. Here are my random thoughts about the patch. I couldn't understand the patch fully, because some of APIs are not used by ctidscan. If Custom Scan patch v2 review * Custom plan class comparison In backend/optimizer/util/pathnode.c, custclass is compared by bit-and with 's'. Do you plan to use custclass as bit field? If so, values for custom plan class should not be a character. Otherwise, custclass should be compared by == operator. * Purpose of GetSpecialCustomVar() The reason why FinalizeCustomPlan callback is necessary is not clear to me. Could you show a case that the API would be useful? * Purpose of FinalizeCustomPlan() The reason why FinalizeCustomPlan callback is necessary is not clear to me, because ctidscan just calls finalize_primnode() and bms_add_members() with given information. Could you show a case that the API would be useful? * Is it ok to call set_cheapest() for all relkind? Now set_cheapest() is called not for only relation and foreign table but also custom plan, and other relations such as subquery, function, and value. Calling call_custom_scan_provider() and set_cheapest() in the case of RTE_RELATION seems similar to the old construct, how do you think about this? * Is it hard to get rid of CopyCustomPlan()? Copying ForeignScan node doesn't need per-FDW copy function by limiting fdw_private to have only copy-able objects. Can't we use the same way for CustomPlan? Letting authors call NodeSetTag or copyObject() sounds uncomfortable to me. This would be able to apply to TextOutCustomPlan() and TextOutCustomPath() too. * MultiExec support is appropriate for the first version? The cases need MultiExec seems little complex for the first version of custom scan. What kind of plan do you image for this feature? * Does SupportBackwardScan() have enough information? Other scans check target list with TargetListSupportsBackwardScan(). Isn't it necessary to check it for CustomPlan too in ExecSupportsBackwardScan()? * Place to call custom plan provider Is it necessary to call provider when relkind != RELKIND_RELATION? If yes, isn't it necessary to call for append relation? I know that we concentrate to replacing scan in the initial version, so it would not be a serious problem, but it would be good to consider extensible design. * Custom Plan Provider is "addpath"? Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER seems little odd. Using handler like FDW makes the design too complex and/or messy? * superclass of CustomPlanState CustomPlanState derives ScanState, instead of deriving PlanState directly. I worry the case of non-heap-scan custom plan, but it might be ok to postpone consideration about that at the first cut. * Naming and granularity of objects related to custom plan I'm not sure the current naming is appropriate, especially difference between "custom plan" and "provider" and "handler". In the context of CREATE CUSTOM PLAN statement, what the term "custom plan" means? My impression is that "custom plan" is an alternative plan type, e.g. ctidscan or pg_strom_scan. Then what the term "provider" means? My impression for that is extension, such as ctidscan and pg_strom. The grammar allows users to pass function via PROVIDER clause of CREATE CUSTOM SCAN, so the function would be the provider of the custom plan created by the statement. * enable_customscan GUC parameter enable_customscan would be useful for users who want to disable custom plan feature temporarily. In the case of pg_strom, using GPU for limited sessions for analytic or batch applications seems handy. * Adding pg_custom_plan catalog Using "cust" as prefix for pg_custom_plan causes ambiguousness which makes it difficult to choose catalog prefix for a feature named "Custom Foo" in future. How about using "cusp" (CUStom Plan)? Or is it better to use pg_custom_plan_provider as catalog relation name, as the document says that "CREATE CUSTOM PLAN defines custom plan provider". Then prefix could be "cpp" (Custom Plan Provider). This seems to match the wording used for pg_foreign_data_wrapper. * CREATE CUSTOM PLAN statement This is just a question: We need to emit CREATE CUSTOM PLAN if we want to use I wonder how it is extended when supporting join as custom class. * New operators about TID comparison IMO this portion should be a separated patch, because it adds OID definition of existing operators such as tidgt and tidle. Is there any (explicit or implicit) rule about defining macro for oid of an operator? * Prototype of get_custom_plan_oid() custname (or cppname if use the rule I proposed above) seems appropriate as the parameter name of get_custom_plan_oid() because similar funcitons use catalog column names in such case. * Coding conventions Some lines are indented with white space. Future pgindent run will fix this issue? * Unnecessary struct forward declaration Forward declarations of CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h seem unncecessary. Other headers might have same issue. * Unnecessary externing of replace_nestloop_params() replace_nestloop_params() is extern-ed but it's never called outside createplan.c. * Externing fix_scan_expr() If it's necessary for all custom plan providers to call fix_scan_expr (via fix_scan_list macro), isn't it able to do it in set_plan_refs() before calling SetCustomPlanRef()? * What does T_CustomPlanMarkPos mean? It's not clear to me when CustomPlanMarkPos works. Is it for a custom plan provider which supports marking position and rewind to the position, and ctidscan just lacks capability to do that, so it is not used anywhere? * Unnecessary changes in allpaths.c some comment about Subquery and CTE are changed (perhaps) accidentally. * Typos * planenr -> planner, implements -> implement in create_custom_plan.sgml * CustomScan in nodeCustom.h should be CustomPlan?* delivered -> derived, in src/backend/optimizer/util/pathnode.c * Document "Writing Custom Plan Provider" is not provided Custom Plan Provider author would (and I DO!) hope documents about writing a custom plan provider. Regards, 2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Hanada-san, > > Thanks for your checks. I oversight the points when I submit the patch, sorry. > The attached one is revised one on documentation stuff and contrib/Makefile. > > Thanks, > > 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: >> Kaigai-san, >> >> I've just applied v1 patch, and tried build and install, but I found two issues: >> >> 1) The contrib/ctidscan is not automatically built/installed because >> it's not described in contrib/Makefile. Is this expected behavior? >> 2) I got an error message below when building document. >> >> $ cd doc/src/sgml >> $ make >> openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . >> -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml >> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID >> "SQL-CREATECUSTOMPLAN" >> make: *** [HTML.index] Error 1 >> make: *** Deleting file `HTML.index' >> >> I'll review another part of the patch, including the design. >> >> >> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: >>> According to the discussion upthread, I revised the custom-plan patch >>> to focus on regular relation scan but no join support right now, and to >>> support DDL command to define custom-plan providers. >>> >>> Planner integration with custom logic to scan a particular relation is >>> enough simple, unlike various join cases. It's almost similar to what >>> built-in logic are doing now - custom-plan provider adds a path node >>> with its cost estimation if it can offer alternative way to scan referenced >>> relation. (in case of no idea, it does not need to add any paths) >>> >>> A new DDL syntax I'd like to propose is below: >>> >>> CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; >>> >>> <name> is as literal, put a unique identifier. >>> <class> is workload type to be offered by this custom-plan provider. >>> "scan" is the only option right now, that means base relation scan. >>> <function_name> is also as literal; it shall perform custom-plan provider. >>> >>> A custom-plan provider function is assumed to take an argument of >>> "internal" type to deliver a set of planner information that is needed to >>> construct custom-plan pathnode. >>> In case of "scan" class, pointer towards an customScanArg object >>> shall be delivered on invocation of custom-plan provider. >>> >>> typedef struct { >>> uint32 custom_class; >>> PlannerInfo *root; >>> RelOptInfo *baserel; >>> RangeTblEntry *rte; >>> } customScanArg; >>> >>> In case when the custom-plan provider function being invoked thought >>> it can offer an alternative scan path on the relation of "baserel", things >>> to do is (1) construct a CustomPath (or its inherited data type) object >>> with a table of callback function pointers (2) put its own cost estimation, >>> and (3) call add_path() to register this path as an alternative one. >>> >>> Once the custom-path was chosen by query planner, its CreateCustomPlan >>> callback is called to populate CustomPlan node based on the pathnode. >>> It also has a table of callback function pointers to handle various planner's >>> job in setrefs.c and so on. >>> >>> Similarly, its CreateCustomPlanState callback is called to populate >>> CustomPlanState node based on the plannode. It also has a table of >>> callback function pointers to handle various executor's job during quey >>> execution. >>> >>> Most of callback designs are not changed from the prior proposition in >>> v9.4 development cycle, however, here is a few changes. >>> >>> * CustomPlan became to inherit Scan, and CustomPlanState became to >>> inherit ScanState. Because some useful routines to implement scan- >>> logic, like ExecScan, expects state-node has ScanState as its base >>> type, it's more kindness for extension side. (I'd like to avoid each >>> extension reinvent ExecScan by copy & paste!) >>> I'm not sure whether it should be a union of Join in the future, however, >>> it is a reasonable choice to have compatible layout with Scan/ScanState >>> to implement alternative "scan" logic. >>> >>> * Exporting static functions - I still don't have a graceful answer here. >>> However, it is quite natural that extensions to follow up interface updates >>> on the future version up of PostgreSQL. >>> Probably, it shall become clear what class of functions shall be >>> exported and what class of functions shall be re-implemented within >>> extension side in the later discussion. >>> Right now, I exported minimum ones that are needed to implement >>> alternative scan method - contrib/ctidscan module. >>> >>> Items to be discussed later: >>> * planner integration for relations join - probably, we may define new >>> custom-plan classes as alternative of hash-join, merge-join and >>> nest-loop. If core can know this custom-plan is alternative of hash- >>> join, we can utilize core code to check legality of join. >>> * generic key-value style options in custom-plan definition - Hanada >>> san proposed me off-list - like foreign data wrapper. It may enable >>> to configure multiple behavior on a binary. >>> * ownership and access control of custom-plan. right now, only >>> superuser can create/drop custom-plan provider definition, thus, >>> it has no explicit ownership and access control. It seems to me >>> a reasonable assumption, however, we may have a usecase that >>> needs custom-plan by unprivileged users. >>> >>> Thanks, >>> >>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >>>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> >>>>> >> We're past the prototyping stage and into productionising what we >>>>> >> know works, AFAIK. If that point is not clear, then we need to >>>>> >> discuss that first. >>>>> > >>>>> > OK, I'll bite: what here do we know works? Not a damn thing AFAICS; >>>>> > it's all speculation that certain hooks might be useful, and >>>>> > speculation that's not supported by a lot of evidence. If you think >>>>> > this isn't prototyping, I wonder what you think *is* prototyping. >>>>> >>>>> My research contacts advise me of this recent work >>>>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf >>>>> and also that they expect a prototype to be ready by October, which I have >>>>> been told will be open source. >>>>> >>>>> So there are at least two groups looking at this as a serious option for >>>>> Postgres (not including the above paper's authors). >>>>> >>>>> That isn't *now*, but it is at least a time scale that fits with acting >>>>> on this in the next release, if we can separate out the various ideas and >>>>> agree we wish to proceed. >>>>> >>>>> I'll submerge again... >>>>> >>>> Through the discussion last week, our minimum consensus are: >>>> 1. Deregulated enhancement of FDW is not a way to go >>>> 2. Custom-path that can replace built-in scan makes sense as a first step >>>> towards the future enhancement. Its planner integration is enough simple >>>> to do. >>>> 3. Custom-path that can replace built-in join takes investigation how to >>>> integrate existing planner structure, to avoid (3a) reinvention of >>>> whole of join handling in extension side, and (3b) unnecessary extension >>>> calls towards the case obviously unsupported. >>>> >>>> So, I'd like to start the (2) portion towards the upcoming 1st commit-fest >>>> on the v9.5 development cycle. Also, we will be able to have discussion >>>> for the (3) portion concurrently, probably, towards 2nd commit-fest. >>>> >>>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please share >>>> us the face-to-face discussion here. >>>> >>>> Thanks, >>>> -- >>>> NEC OSS Promotion Center / PG-Strom Project >>>> KaiGai Kohei <kaigai@ak.jp.nec.com> >>>> >>> -- >>> KaiGai Kohei <kaigai@kaigai.gr.jp> >> >> >> >> -- >> Shigeru HANADA > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- Shigeru HANADA
Hanada-san, Thanks for your dedicated reviewing. It's a very long message. So, let me summarize the things I shall do in the next patch. * fix bug: custom-plan class comparison * fix up naming convention and syntax CREATE CUSTOM PLAN PROVIDER, rather than CREATE CUSTOM PLAN. Prefix shall be "cpp_". * fix up: definition of get_custom_plan_oid() * fix up: unexpected white spaces, to be tabs. * fix up: remove unnecessary forward declarations. * fix up: revert replace_nestloop_params() to static * make SetCustomPlanRef an optional callback * fix up: typos in various points * add documentation to explain custom-plan interface. Also, I want committer's opinion about the issues below * whether set_cheapest() is called for all relkind? * how argument of add_path handler shall be derivered? Individual comments are put below: > Kaigai-san, > > Sorry for lagged response. > > Here are my random thoughts about the patch. I couldn't understand the > patch fully, because some of APIs are not used by ctidscan. If > > Custom Scan patch v2 review > > * Custom plan class comparison > In backend/optimizer/util/pathnode.c, custclass is compared by bit-and > with 's'. Do you plan to use custclass as bit field? If so, values for > custom plan class should not be a character. Otherwise, custclass should > be compared by == operator. > Sorry, it is a bug that come from the previous design. I had an idea that allows a custom plan provider to support multiple kind of exec nodes, however, I concluded it does not make sense so much. (we can define multiple CPP for each) > * Purpose of GetSpecialCustomVar() > The reason why FinalizeCustomPlan callback is necessary is not clear to > me. > Could you show a case that the API would be useful? > It is needed feature to replace a built-in join by custom scan, however, it might be unclear on the scan workloads. Let me explain why join replacement needed. A join node has two input slot (inner and outer), its expression node including Var node reference either of slot according to its varno (INNER_VAR or OUTER_VAR). In case when a CPP replaced a join, it has to generate an equivalent result but it may not be a best choice to use two input streams. (Please remind when we construct remote join on postgres_fdw, all the materialization was done on remote side, thus we had one input stream to generate local join equivalent view.) On the other hands, EXPLAIN command has to understand what column is the source of varnodes in targetlist of custom-node even if it is rewritten to use just one slot. For example, which label shall be shown in case when 3rd item of targetlist is originally come from 2nd item of inner slot but all the materialized result is stored into outer slot. Only CPP can track its relationship between the original and the newer one. This interface provides a way to solve a varnode that actually references. > * Purpose of FinalizeCustomPlan() > The reason why FinalizeCustomPlan callback is necessary is not clear to > me, because ctidscan just calls finalize_primnode() and > bms_add_members() with given information. Could you show a case that the > API would be useful? > The main purpose of this callback gives an extension chance to apply finalize_primenode() if custom-node hold expression tree on its private fields. In case when CPP picked up a part of clauses to run its own way, it shall be attached on neither plan->targetlist nor plan->qual, only CPP knows where does it attached. So, these orphan expression nodes have to be treated by CPP. > * Is it ok to call set_cheapest() for all relkind? > Now set_cheapest() is called not for only relation and foreign table but > also custom plan, and other relations such as subquery, function, and value. > Calling call_custom_scan_provider() and set_cheapest() in the case of > RTE_RELATION seems similar to the old construct, how do you think about > this? > I don't think we may be actually able to have some useful custom scan logic on these special relation forms, however, I also didn't have a special reason why custom-plan does not need to support these special relations. I'd like to see committer's opinion here. > * Is it hard to get rid of CopyCustomPlan()? > Copying ForeignScan node doesn't need per-FDW copy function by limiting > fdw_private to have only copy-able objects. Can't we use the same way for > CustomPlan? Letting authors call NodeSetTag or > copyObject() sounds uncomfortable to me. > > This would be able to apply to TextOutCustomPlan() and TextOutCustomPath() > too. > FDW-like design was the original one, but the latest design was suggestion by Tom on the v9.4 development cycle, because some data types are not complianced to copyObject; like Bitmapset. > * MultiExec support is appropriate for the first version? > The cases need MultiExec seems little complex for the first version of custom > scan. What kind of plan do you image for this feature? > It is definitely necessary to exchange multiple rows with custom-format with upper level if both of nodes are managed by same CPP. I plan to use this interface for bulk-loading that makes much faster data loading to GPUs. > * Does SupportBackwardScan() have enough information? > Other scans check target list with TargetListSupportsBackwardScan(). > Isn't it necessary to check it for CustomPlan too in > ExecSupportsBackwardScan()? > It derivers CustomPlan node itself that includes Plan node. If CPP thought it is necessary, it can run equivalent checks here. > * Place to call custom plan provider > Is it necessary to call provider when relkind != RELKIND_RELATION? If yes, > isn't it necessary to call for append relation? > > I know that we concentrate to replacing scan in the initial version, so > it would not be a serious problem, but it would be good to consider extensible > design. > Regarding of the child relation scan, set_append_rel_pathlist() calls set_rel_pathlist() that is entry point of custom-scan paths. If you mention about alternative-path of Append node, yes, it is not a feature being supported in the first commit. > * Custom Plan Provider is "addpath"? > Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER seems > little odd. > Using handler like FDW makes the design too complex and/or messy? > This design allows to pass a set of information needed according to the workload; like join not only scan. If we need to extend customXXXXArg in the future, all we need to extend is data structure definition, not function prototype itself. Anyway, I'd like to make a decision for this on committer review stage. > * superclass of CustomPlanState > CustomPlanState derives ScanState, instead of deriving PlanState directly. > I worry the case of non-heap-scan custom plan, but it might be ok to postpone > consideration about that at the first cut. > We have some useful routines to implement custom-scan logic, but they takes ScanState argument, like ExecScan(). Even though we can copy it and paste to extension code, it is not a good manner. It takes three pointer variables in addition to PlanState. If CPP does not take care about regular heap scan, keep them unused. It is quite helpful if CPP implements some original logic on top of existing heap scan. > * Naming and granularity of objects related to custom plan I'm not sure > the current naming is appropriate, especially difference between "custom > plan" and "provider" and "handler". In the context of CREATE CUSTOM PLAN > statement, what the term "custom plan" means? My impression is that "custom > plan" is an alternative plan type, e.g. > ctidscan or pg_strom_scan. Then what the term "provider" means? My > impression for that is extension, such as ctidscan and pg_strom. The > grammar allows users to pass function via PROVIDER clause of CREATE CUSTOM > SCAN, so the function would be the provider of the custom plan created by > the statement. > Hmm... What you want to say is, CREATE X statement is expected to create X. On the other hand, "custom-plan" is actually created by custom-plan provider, not this DDL statement. The DDL statement defined custom-plan "provider". I also think the suggestion is reasonable. How about the statement below instead? CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function; cpp_kind := SCAN (other types shall be supportedlater) > * enable_customscan > GUC parameter enable_customscan would be useful for users who want to > disable custom plan feature temporarily. In the case of pg_strom, using > GPU for limited sessions for analytic or batch applications seems handy. > It should be done by extension side individually. Please imagine a user who install custom-GPU-scan, custom-matview-redirect and custom-cache-only-scan. Purpose of each CPP are quite individually, so I don't think enable_customscan makes sense. > * Adding pg_custom_plan catalog > Using "cust" as prefix for pg_custom_plan causes ambiguousness which makes > it difficult to choose catalog prefix for a feature named "Custom Foo" in > future. How about using "cusp" (CUStom Plan)? > > Or is it better to use pg_custom_plan_provider as catalog relation name, > as the document says that "CREATE CUSTOM PLAN defines custom plan provider". > Then prefix could be "cpp" (Custom Plan Provider). > This seems to match the wording used for pg_foreign_data_wrapper. > My preference "cpp" as a shorten of custom plan provider. > * CREATE CUSTOM PLAN statement > This is just a question: We need to emit CREATE CUSTOM PLAN if we want > to use I wonder how it is extended when supporting join as custom class. > In case of join, I'll extend the syntax as follows: CREATE CUSTOM PLAN cppname FOR [HASH JOIN|MERGE JOIN|NEST LOOP] PROVIDER provider_func; Like customScanArg, we will define an argument type for each join methods then provider_func shall be called with this argument. I think it is well flexible and extendable approach. > * New operators about TID comparison > IMO this portion should be a separated patch, because it adds OID definition > of existing operators such as tidgt and tidle. Is there any (explicit or > implicit) rule about defining macro for oid of an operator? > I don't know the general rules to define static OID definition. Probably, these are added on demand. > * Prototype of get_custom_plan_oid() > custname (or cppname if use the rule I proposed above) seems appropriate > as the parameter name of get_custom_plan_oid() because similar funcitons > use catalog column names in such case. > I'll rename it as follows: extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool missing_ok); > * Coding conventions > Some lines are indented with white space. Future pgindent run will fix > this issue? > It's my oversight, to be fixed. > * Unnecessary struct forward declaration Forward declarations of > CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h seem > unncecessary. Other headers might have same issue. > I'll check it. I had try & error during the development. It might leave a dead code here. > * Unnecessary externing of replace_nestloop_params() > replace_nestloop_params() is extern-ed but it's never called outside > createplan.c. > Indeed, it's not needed until we support custom join logic. > * Externing fix_scan_expr() > If it's necessary for all custom plan providers to call fix_scan_expr (via > fix_scan_list macro), isn't it able to do it in set_plan_refs() before > calling SetCustomPlanRef()? > One alternative idea is: if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node has no private expression treeto be fixed up, CPP can have no SetCustomPlanRef callback. In this case, core backend applies fix_scan_list on the targetlistand qual, then adjust scanrelid. It was what I did in the previous revision, that was concerned by Tom because it assumes too much things to the custom-node. (It is useful to only custom "scan" node) > * What does T_CustomPlanMarkPos mean? > It's not clear to me when CustomPlanMarkPos works. Is it for a custom plan > provider which supports marking position and rewind to the position, and > ctidscan just lacks capability to do that, so it is not used anywhere? > Its previous design had a flag whether it allows backward scan, in the body of CustomPlan structure. However, it makes a problem on ExecSupportsMarkRestore() that takes only node-tag to determine whether the supplied node support backward scan or not. Once I tried to change ExecSupportsMarkRestore() to accept node body, then Tom suggested to use a separated node tag instead. > * Unnecessary changes in allpaths.c > some comment about Subquery and CTE are changed (perhaps) accidentally. > No, it is intentional because set_cheapest() was consolidated. > * Typos > * planenr -> planner, implements -> implement in create_custom_plan.sgml > * CustomScan in nodeCustom.h should be CustomPlan? > * delivered -> derived, in src/backend/optimizer/util/pathnode.c > OK, I'll fix them. > * Document "Writing Custom Plan Provider" is not provided Custom Plan > Provider author would (and I DO!) hope documents about writing a custom > plan provider. > A documentation like fdwhandler.sgml, isn't it? OK, I'll make it up. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > 2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > Hanada-san, > > > > Thanks for your checks. I oversight the points when I submit the patch, > sorry. > > The attached one is revised one on documentation stuff and > contrib/Makefile. > > > > Thanks, > > > > 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: > >> Kaigai-san, > >> > >> I've just applied v1 patch, and tried build and install, but I found > two issues: > >> > >> 1) The contrib/ctidscan is not automatically built/installed because > >> it's not described in contrib/Makefile. Is this expected behavior? > >> 2) I got an error message below when building document. > >> > >> $ cd doc/src/sgml > >> $ make > >> openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . > >> -d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml > >> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID > >> "SQL-CREATECUSTOMPLAN" > >> make: *** [HTML.index] Error 1 > >> make: *** Deleting file `HTML.index' > >> > >> I'll review another part of the patch, including the design. > >> > >> > >> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > >>> According to the discussion upthread, I revised the custom-plan > >>> patch to focus on regular relation scan but no join support right > >>> now, and to support DDL command to define custom-plan providers. > >>> > >>> Planner integration with custom logic to scan a particular relation > >>> is enough simple, unlike various join cases. It's almost similar to > >>> what built-in logic are doing now - custom-plan provider adds a path > >>> node with its cost estimation if it can offer alternative way to > >>> scan referenced relation. (in case of no idea, it does not need to > >>> add any paths) > >>> > >>> A new DDL syntax I'd like to propose is below: > >>> > >>> CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; > >>> > >>> <name> is as literal, put a unique identifier. > >>> <class> is workload type to be offered by this custom-plan provider. > >>> "scan" is the only option right now, that means base relation scan. > >>> <function_name> is also as literal; it shall perform custom-plan > provider. > >>> > >>> A custom-plan provider function is assumed to take an argument of > >>> "internal" type to deliver a set of planner information that is > >>> needed to construct custom-plan pathnode. > >>> In case of "scan" class, pointer towards an customScanArg object > >>> shall be delivered on invocation of custom-plan provider. > >>> > >>> typedef struct { > >>> uint32 custom_class; > >>> PlannerInfo *root; > >>> RelOptInfo *baserel; > >>> RangeTblEntry *rte; > >>> } customScanArg; > >>> > >>> In case when the custom-plan provider function being invoked thought > >>> it can offer an alternative scan path on the relation of "baserel", > >>> things to do is (1) construct a CustomPath (or its inherited data > >>> type) object with a table of callback function pointers (2) put its > >>> own cost estimation, and (3) call add_path() to register this path as > an alternative one. > >>> > >>> Once the custom-path was chosen by query planner, its > >>> CreateCustomPlan callback is called to populate CustomPlan node based > on the pathnode. > >>> It also has a table of callback function pointers to handle various > >>> planner's job in setrefs.c and so on. > >>> > >>> Similarly, its CreateCustomPlanState callback is called to populate > >>> CustomPlanState node based on the plannode. It also has a table of > >>> callback function pointers to handle various executor's job during > >>> quey execution. > >>> > >>> Most of callback designs are not changed from the prior proposition > >>> in > >>> v9.4 development cycle, however, here is a few changes. > >>> > >>> * CustomPlan became to inherit Scan, and CustomPlanState became to > >>> inherit ScanState. Because some useful routines to implement scan- > >>> logic, like ExecScan, expects state-node has ScanState as its base > >>> type, it's more kindness for extension side. (I'd like to avoid each > >>> extension reinvent ExecScan by copy & paste!) > >>> I'm not sure whether it should be a union of Join in the future, > however, > >>> it is a reasonable choice to have compatible layout with > Scan/ScanState > >>> to implement alternative "scan" logic. > >>> > >>> * Exporting static functions - I still don't have a graceful answer > here. > >>> However, it is quite natural that extensions to follow up interface > updates > >>> on the future version up of PostgreSQL. > >>> Probably, it shall become clear what class of functions shall be > >>> exported and what class of functions shall be re-implemented within > >>> extension side in the later discussion. > >>> Right now, I exported minimum ones that are needed to implement > >>> alternative scan method - contrib/ctidscan module. > >>> > >>> Items to be discussed later: > >>> * planner integration for relations join - probably, we may define new > >>> custom-plan classes as alternative of hash-join, merge-join and > >>> nest-loop. If core can know this custom-plan is alternative of hash- > >>> join, we can utilize core code to check legality of join. > >>> * generic key-value style options in custom-plan definition - Hanada > >>> san proposed me off-list - like foreign data wrapper. It may enable > >>> to configure multiple behavior on a binary. > >>> * ownership and access control of custom-plan. right now, only > >>> superuser can create/drop custom-plan provider definition, thus, > >>> it has no explicit ownership and access control. It seems to me > >>> a reasonable assumption, however, we may have a usecase that > >>> needs custom-plan by unprivileged users. > >>> > >>> Thanks, > >>> > >>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > >>>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>>>> > >>>>> >> We're past the prototyping stage and into productionising what > >>>>> >> we know works, AFAIK. If that point is not clear, then we need > >>>>> >> to discuss that first. > >>>>> > > >>>>> > OK, I'll bite: what here do we know works? Not a damn thing > >>>>> > AFAICS; it's all speculation that certain hooks might be useful, > >>>>> > and speculation that's not supported by a lot of evidence. If > >>>>> > you think this isn't prototyping, I wonder what you think *is* > prototyping. > >>>>> > >>>>> My research contacts advise me of this recent work > >>>>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf > >>>>> and also that they expect a prototype to be ready by October, > >>>>> which I have been told will be open source. > >>>>> > >>>>> So there are at least two groups looking at this as a serious > >>>>> option for Postgres (not including the above paper's authors). > >>>>> > >>>>> That isn't *now*, but it is at least a time scale that fits with > >>>>> acting on this in the next release, if we can separate out the > >>>>> various ideas and agree we wish to proceed. > >>>>> > >>>>> I'll submerge again... > >>>>> > >>>> Through the discussion last week, our minimum consensus are: > >>>> 1. Deregulated enhancement of FDW is not a way to go 2. Custom-path > >>>> that can replace built-in scan makes sense as a first step > >>>> towards the future enhancement. Its planner integration is enough > simple > >>>> to do. > >>>> 3. Custom-path that can replace built-in join takes investigation how > to > >>>> integrate existing planner structure, to avoid (3a) reinvention > of > >>>> whole of join handling in extension side, and (3b) unnecessary > extension > >>>> calls towards the case obviously unsupported. > >>>> > >>>> So, I'd like to start the (2) portion towards the upcoming 1st > >>>> commit-fest on the v9.5 development cycle. Also, we will be able to > >>>> have discussion for the (3) portion concurrently, probably, towards > 2nd commit-fest. > >>>> > >>>> Unfortunately, I cannot participate PGcon/Ottawa this year. Please > >>>> share us the face-to-face discussion here. > >>>> > >>>> Thanks, > >>>> -- > >>>> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > >>>> <kaigai@ak.jp.nec.com> > >>>> > >>> -- > >>> KaiGai Kohei <kaigai@kaigai.gr.jp> > >> > >> > >> > >> -- > >> Shigeru HANADA > > > > > > > > -- > > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > -- > Shigeru HANADA
Hanada-san, The attached patch is revised one. Updates from the previous version are below: * System catalog name was changed to pg_custom_plan_provider; that reflects role of the object being defined. * Also, prefix of its variable names are changed to "cpp"; that means custom-plan-provider. * Syntax also reflects what the command does more. New syntax to define custom plan provider is: CREATE CUSTOM PLAN PROVIDER <cpp_name> FOR <cpp_class> HANDLER <cpp_function>; * Add custom-plan.sgml to introduce interface functions defined for path/plan/exec methods. * FinalizeCustomPlan() callback was simplified to support scan (and join in the future) at the starting point. As long as scan/join requirement, no need to control paramids bitmap in arbitrary way. * Unnecessary forward declaration in relation.h and plannode.h were removed, but a few structures still needs to have forward declarations. * Fix typos being pointed out. I'd like to see committer's suggestion regarding to the design issues below: * whether set_cheapest() is called for all relkind? ->according to the discussion in v9.4 cycle, I consolidated set_cheapest() in allpaths.c to set_rel_pathlist(). Hanada-san wonder whether it is necessary to have custom- plan on none base relations; like sub-query or values-scan. I don't have reason why not to run custom-plan on these non usual relations. * how argument of add_path handler shall be derivered? -> custom-plan handler function takes an argument with internal data type; that is a pointer of customScanArg if custom-plan class is "scan". (It shall be customHashJoinArg if "hash join" for example). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Friday, July 04, 2014 1:23 PM > To: 'Shigeru Hanada'; Kohei KaiGai > Cc: Simon Riggs; Tom Lane; Stephen Frost; Robert Haas; Andres Freund; > PgHacker; Jim Mlodgenski; Peter Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > Hanada-san, > > Thanks for your dedicated reviewing. > > It's a very long message. So, let me summarize the things I shall do in > the next patch. > > * fix bug: custom-plan class comparison > * fix up naming convention and syntax > CREATE CUSTOM PLAN PROVIDER, rather than > CREATE CUSTOM PLAN. Prefix shall be "cpp_". > * fix up: definition of get_custom_plan_oid() > * fix up: unexpected white spaces, to be tabs. > * fix up: remove unnecessary forward declarations. > * fix up: revert replace_nestloop_params() to static > * make SetCustomPlanRef an optional callback > * fix up: typos in various points > * add documentation to explain custom-plan interface. > > Also, I want committer's opinion about the issues below > * whether set_cheapest() is called for all relkind? > * how argument of add_path handler shall be derivered? > > Individual comments are put below: > > > Kaigai-san, > > > > Sorry for lagged response. > > > > Here are my random thoughts about the patch. I couldn't understand > > the patch fully, because some of APIs are not used by ctidscan. If > > > > Custom Scan patch v2 review > > > > * Custom plan class comparison > > In backend/optimizer/util/pathnode.c, custclass is compared by bit-and > > with 's'. Do you plan to use custclass as bit field? If so, values > > for custom plan class should not be a character. Otherwise, custclass > > should be compared by == operator. > > > Sorry, it is a bug that come from the previous design. > I had an idea that allows a custom plan provider to support multiple kind > of exec nodes, however, I concluded it does not make sense so much. (we > can define multiple CPP for each) > > > * Purpose of GetSpecialCustomVar() > > The reason why FinalizeCustomPlan callback is necessary is not clear > > to me. > > Could you show a case that the API would be useful? > > > It is needed feature to replace a built-in join by custom scan, however, > it might be unclear on the scan workloads. > > Let me explain why join replacement needed. A join node has two input slot > (inner and outer), its expression node including Var node reference either > of slot according to its varno (INNER_VAR or OUTER_VAR). > In case when a CPP replaced a join, it has to generate an equivalent result > but it may not be a best choice to use two input streams. > (Please remind when we construct remote join on postgres_fdw, all the > materialization was done on remote side, thus we had one input stream to > generate local join equivalent view.) On the other hands, EXPLAIN command > has to understand what column is the source of varnodes in targetlist of > custom-node even if it is rewritten to use just one slot. For example, which > label shall be shown in case when 3rd item of targetlist is originally come > from 2nd item of inner slot but all the materialized result is stored into > outer slot. > Only CPP can track its relationship between the original and the newer one. > This interface provides a way to solve a varnode that actually references. > > > * Purpose of FinalizeCustomPlan() > > The reason why FinalizeCustomPlan callback is necessary is not clear > > to me, because ctidscan just calls finalize_primnode() and > > bms_add_members() with given information. Could you show a case that > > the API would be useful? > > > The main purpose of this callback gives an extension chance to apply > finalize_primenode() if custom-node hold expression tree on its private > fields. > In case when CPP picked up a part of clauses to run its own way, it shall > be attached on neither plan->targetlist nor plan->qual, only CPP knows where > does it attached. So, these orphan expression nodes have to be treated by > CPP. > > > * Is it ok to call set_cheapest() for all relkind? > > Now set_cheapest() is called not for only relation and foreign table > > but also custom plan, and other relations such as subquery, function, > and value. > > Calling call_custom_scan_provider() and set_cheapest() in the case of > > RTE_RELATION seems similar to the old construct, how do you think > > about this? > > > I don't think we may be actually able to have some useful custom scan logic > on these special relation forms, however, I also didn't have a special reason > why custom-plan does not need to support these special relations. > I'd like to see committer's opinion here. > > > > * Is it hard to get rid of CopyCustomPlan()? > > Copying ForeignScan node doesn't need per-FDW copy function by > > limiting fdw_private to have only copy-able objects. Can't we use the > > same way for CustomPlan? Letting authors call NodeSetTag or > > copyObject() sounds uncomfortable to me. > > > > This would be able to apply to TextOutCustomPlan() and > > TextOutCustomPath() too. > > > FDW-like design was the original one, but the latest design was suggestion > by Tom on the v9.4 development cycle, because some data types are not > complianced to copyObject; like Bitmapset. > > > * MultiExec support is appropriate for the first version? > > The cases need MultiExec seems little complex for the first version of > > custom scan. What kind of plan do you image for this feature? > > > It is definitely necessary to exchange multiple rows with custom-format > with upper level if both of nodes are managed by same CPP. > I plan to use this interface for bulk-loading that makes much faster data > loading to GPUs. > > > * Does SupportBackwardScan() have enough information? > > Other scans check target list with TargetListSupportsBackwardScan(). > > Isn't it necessary to check it for CustomPlan too in > > ExecSupportsBackwardScan()? > > > It derivers CustomPlan node itself that includes Plan node. > If CPP thought it is necessary, it can run equivalent checks here. > > > * Place to call custom plan provider > > Is it necessary to call provider when relkind != RELKIND_RELATION? If > > yes, isn't it necessary to call for append relation? > > > > I know that we concentrate to replacing scan in the initial version, > > so it would not be a serious problem, but it would be good to consider > > extensible design. > > > Regarding of the child relation scan, set_append_rel_pathlist() calls > set_rel_pathlist() that is entry point of custom-scan paths. > If you mention about alternative-path of Append node, yes, it is not a > feature being supported in the first commit. > > > * Custom Plan Provider is "addpath"? > > Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER > > seems little odd. > > Using handler like FDW makes the design too complex and/or messy? > > > This design allows to pass a set of information needed according to the > workload; like join not only scan. If we need to extend customXXXXArg in > the future, all we need to extend is data structure definition, not function > prototype itself. > Anyway, I'd like to make a decision for this on committer review stage. > > > * superclass of CustomPlanState > > CustomPlanState derives ScanState, instead of deriving PlanState > directly. > > I worry the case of non-heap-scan custom plan, but it might be ok to > > postpone consideration about that at the first cut. > > > We have some useful routines to implement custom-scan logic, but they takes > ScanState argument, like ExecScan(). > Even though we can copy it and paste to extension code, it is not a good > manner. > It takes three pointer variables in addition to PlanState. If CPP does not > take care about regular heap scan, keep them unused. It is quite helpful > if CPP implements some original logic on top of existing heap scan. > > > * Naming and granularity of objects related to custom plan I'm not > > sure the current naming is appropriate, especially difference between > > "custom plan" and "provider" and "handler". In the context of CREATE > > CUSTOM PLAN statement, what the term "custom plan" means? My > > impression is that "custom plan" is an alternative plan type, e.g. > > ctidscan or pg_strom_scan. Then what the term "provider" means? My > > impression for that is extension, such as ctidscan and pg_strom. The > > grammar allows users to pass function via PROVIDER clause of CREATE > > CUSTOM SCAN, so the function would be the provider of the custom plan > > created by the statement. > > > Hmm... What you want to say is, CREATE X statement is expected to create > X. > On the other hand, "custom-plan" is actually created by custom-plan provider, > not this DDL statement. The DDL statement defined custom-plan "provider". > I also think the suggestion is reasonable. > > How about the statement below instead? > > CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function; > cpp_kind := SCAN (other types shall be supported later) > > > * enable_customscan > > GUC parameter enable_customscan would be useful for users who want to > > disable custom plan feature temporarily. In the case of pg_strom, > > using GPU for limited sessions for analytic or batch applications seems > handy. > > > It should be done by extension side individually. > Please imagine a user who install custom-GPU-scan, custom-matview-redirect > and custom-cache-only-scan. Purpose of each CPP are quite individually, > so I don't think enable_customscan makes sense. > > > * Adding pg_custom_plan catalog > > Using "cust" as prefix for pg_custom_plan causes ambiguousness which > > makes it difficult to choose catalog prefix for a feature named > > "Custom Foo" in future. How about using "cusp" (CUStom Plan)? > > > > Or is it better to use pg_custom_plan_provider as catalog relation > > name, as the document says that "CREATE CUSTOM PLAN defines custom plan > provider". > > Then prefix could be "cpp" (Custom Plan Provider). > > This seems to match the wording used for pg_foreign_data_wrapper. > > > My preference "cpp" as a shorten of custom plan provider. > > > > * CREATE CUSTOM PLAN statement > > This is just a question: We need to emit CREATE CUSTOM PLAN if we > > want to use I wonder how it is extended when supporting join as custom > class. > > > In case of join, I'll extend the syntax as follows: > > CREATE CUSTOM PLAN cppname > FOR [HASH JOIN|MERGE JOIN|NEST LOOP] > PROVIDER provider_func; > > Like customScanArg, we will define an argument type for each join methods > then provider_func shall be called with this argument. > I think it is well flexible and extendable approach. > > > * New operators about TID comparison > > IMO this portion should be a separated patch, because it adds OID > > definition of existing operators such as tidgt and tidle. Is there > > any (explicit or > > implicit) rule about defining macro for oid of an operator? > > > I don't know the general rules to define static OID definition. > Probably, these are added on demand. > > > * Prototype of get_custom_plan_oid() > > custname (or cppname if use the rule I proposed above) seems > > appropriate as the parameter name of get_custom_plan_oid() because > > similar funcitons use catalog column names in such case. > > > I'll rename it as follows: > > extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool > missing_ok); > > > > * Coding conventions > > Some lines are indented with white space. Future pgindent run will > > fix this issue? > > > It's my oversight, to be fixed. > > > * Unnecessary struct forward declaration Forward declarations of > > CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h > > seem unncecessary. Other headers might have same issue. > > > I'll check it. I had try & error during the development. It might leave > a dead code here. > > > * Unnecessary externing of replace_nestloop_params() > > replace_nestloop_params() is extern-ed but it's never called outside > > createplan.c. > > > Indeed, it's not needed until we support custom join logic. > > > * Externing fix_scan_expr() > > If it's necessary for all custom plan providers to call fix_scan_expr > > (via fix_scan_list macro), isn't it able to do it in set_plan_refs() > > before calling SetCustomPlanRef()? > > > One alternative idea is: > if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node > has no private expression tree to be fixed up, CPP can have no > SetCustomPlanRef callback. In this case, core backend applies > fix_scan_list on the targetlist and qual, then adjust scanrelid. > > It was what I did in the previous revision, that was concerned by Tom because > it assumes too much things to the custom-node. (It is useful to only custom > "scan" node) > > > * What does T_CustomPlanMarkPos mean? > > It's not clear to me when CustomPlanMarkPos works. Is it for a custom > > plan provider which supports marking position and rewind to the > > position, and ctidscan just lacks capability to do that, so it is not > used anywhere? > > > Its previous design had a flag whether it allows backward scan, in the body > of CustomPlan structure. However, it makes a problem on > ExecSupportsMarkRestore() that takes only node-tag to determine whether > the supplied node support backward scan or not. > Once I tried to change ExecSupportsMarkRestore() to accept node body, then > Tom suggested to use a separated node tag instead. > > > > * Unnecessary changes in allpaths.c > > some comment about Subquery and CTE are changed (perhaps) accidentally. > > > No, it is intentional because set_cheapest() was consolidated. > > > * Typos > > * planenr -> planner, implements -> implement in > create_custom_plan.sgml > > * CustomScan in nodeCustom.h should be CustomPlan? > > * delivered -> derived, in src/backend/optimizer/util/pathnode.c > > > OK, I'll fix them. > > > * Document "Writing Custom Plan Provider" is not provided Custom Plan > > Provider author would (and I DO!) hope documents about writing a > > custom plan provider. > > > A documentation like fdwhandler.sgml, isn't it? > OK, I'll make it up. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > 2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > > Hanada-san, > > > > > > Thanks for your checks. I oversight the points when I submit the > > > patch, > > sorry. > > > The attached one is revised one on documentation stuff and > > contrib/Makefile. > > > > > > Thanks, > > > > > > 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: > > >> Kaigai-san, > > >> > > >> I've just applied v1 patch, and tried build and install, but I > > >> found > > two issues: > > >> > > >> 1) The contrib/ctidscan is not automatically built/installed > > >> because it's not described in contrib/Makefile. Is this expected > behavior? > > >> 2) I got an error message below when building document. > > >> > > >> $ cd doc/src/sgml > > >> $ make > > >> openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . > > >> -d stylesheet.dsl -t sgml -i output-html -V html-index > > >> postgres.sgml > > >> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID > > >> "SQL-CREATECUSTOMPLAN" > > >> make: *** [HTML.index] Error 1 > > >> make: *** Deleting file `HTML.index' > > >> > > >> I'll review another part of the patch, including the design. > > >> > > >> > > >> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > >>> According to the discussion upthread, I revised the custom-plan > > >>> patch to focus on regular relation scan but no join support right > > >>> now, and to support DDL command to define custom-plan providers. > > >>> > > >>> Planner integration with custom logic to scan a particular > > >>> relation is enough simple, unlike various join cases. It's almost > > >>> similar to what built-in logic are doing now - custom-plan > > >>> provider adds a path node with its cost estimation if it can offer > > >>> alternative way to scan referenced relation. (in case of no idea, > > >>> it does not need to add any paths) > > >>> > > >>> A new DDL syntax I'd like to propose is below: > > >>> > > >>> CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; > > >>> > > >>> <name> is as literal, put a unique identifier. > > >>> <class> is workload type to be offered by this custom-plan provider. > > >>> "scan" is the only option right now, that means base relation scan. > > >>> <function_name> is also as literal; it shall perform custom-plan > > provider. > > >>> > > >>> A custom-plan provider function is assumed to take an argument of > > >>> "internal" type to deliver a set of planner information that is > > >>> needed to construct custom-plan pathnode. > > >>> In case of "scan" class, pointer towards an customScanArg object > > >>> shall be delivered on invocation of custom-plan provider. > > >>> > > >>> typedef struct { > > >>> uint32 custom_class; > > >>> PlannerInfo *root; > > >>> RelOptInfo *baserel; > > >>> RangeTblEntry *rte; > > >>> } customScanArg; > > >>> > > >>> In case when the custom-plan provider function being invoked > > >>> thought it can offer an alternative scan path on the relation of > > >>> "baserel", things to do is (1) construct a CustomPath (or its > > >>> inherited data > > >>> type) object with a table of callback function pointers (2) put > > >>> its own cost estimation, and (3) call add_path() to register this > > >>> path as > > an alternative one. > > >>> > > >>> Once the custom-path was chosen by query planner, its > > >>> CreateCustomPlan callback is called to populate CustomPlan node > > >>> based > > on the pathnode. > > >>> It also has a table of callback function pointers to handle > > >>> various planner's job in setrefs.c and so on. > > >>> > > >>> Similarly, its CreateCustomPlanState callback is called to > > >>> populate CustomPlanState node based on the plannode. It also has a > > >>> table of callback function pointers to handle various executor's > > >>> job during quey execution. > > >>> > > >>> Most of callback designs are not changed from the prior > > >>> proposition in > > >>> v9.4 development cycle, however, here is a few changes. > > >>> > > >>> * CustomPlan became to inherit Scan, and CustomPlanState became to > > >>> inherit ScanState. Because some useful routines to implement scan- > > >>> logic, like ExecScan, expects state-node has ScanState as its base > > >>> type, it's more kindness for extension side. (I'd like to avoid > each > > >>> extension reinvent ExecScan by copy & paste!) > > >>> I'm not sure whether it should be a union of Join in the future, > > however, > > >>> it is a reasonable choice to have compatible layout with > > Scan/ScanState > > >>> to implement alternative "scan" logic. > > >>> > > >>> * Exporting static functions - I still don't have a graceful > > >>> answer > > here. > > >>> However, it is quite natural that extensions to follow up > > >>> interface > > updates > > >>> on the future version up of PostgreSQL. > > >>> Probably, it shall become clear what class of functions shall be > > >>> exported and what class of functions shall be re-implemented within > > >>> extension side in the later discussion. > > >>> Right now, I exported minimum ones that are needed to implement > > >>> alternative scan method - contrib/ctidscan module. > > >>> > > >>> Items to be discussed later: > > >>> * planner integration for relations join - probably, we may define > new > > >>> custom-plan classes as alternative of hash-join, merge-join and > > >>> nest-loop. If core can know this custom-plan is alternative of hash- > > >>> join, we can utilize core code to check legality of join. > > >>> * generic key-value style options in custom-plan definition - Hanada > > >>> san proposed me off-list - like foreign data wrapper. It may enable > > >>> to configure multiple behavior on a binary. > > >>> * ownership and access control of custom-plan. right now, only > > >>> superuser can create/drop custom-plan provider definition, thus, > > >>> it has no explicit ownership and access control. It seems to me > > >>> a reasonable assumption, however, we may have a usecase that > > >>> needs custom-plan by unprivileged users. > > >>> > > >>> Thanks, > > >>> > > >>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > >>>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>>>> > > >>>>> >> We're past the prototyping stage and into productionising > > >>>>> >> what we know works, AFAIK. If that point is not clear, then > > >>>>> >> we need to discuss that first. > > >>>>> > > > >>>>> > OK, I'll bite: what here do we know works? Not a damn thing > > >>>>> > AFAICS; it's all speculation that certain hooks might be > > >>>>> > useful, and speculation that's not supported by a lot of > > >>>>> > evidence. If you think this isn't prototyping, I wonder what > > >>>>> > you think *is* > > prototyping. > > >>>>> > > >>>>> My research contacts advise me of this recent work > > >>>>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf > > >>>>> and also that they expect a prototype to be ready by October, > > >>>>> which I have been told will be open source. > > >>>>> > > >>>>> So there are at least two groups looking at this as a serious > > >>>>> option for Postgres (not including the above paper's authors). > > >>>>> > > >>>>> That isn't *now*, but it is at least a time scale that fits with > > >>>>> acting on this in the next release, if we can separate out the > > >>>>> various ideas and agree we wish to proceed. > > >>>>> > > >>>>> I'll submerge again... > > >>>>> > > >>>> Through the discussion last week, our minimum consensus are: > > >>>> 1. Deregulated enhancement of FDW is not a way to go 2. > > >>>> Custom-path that can replace built-in scan makes sense as a first > step > > >>>> towards the future enhancement. Its planner integration is > > >>>> enough > > simple > > >>>> to do. > > >>>> 3. Custom-path that can replace built-in join takes investigation > > >>>> how > > to > > >>>> integrate existing planner structure, to avoid (3a) > > >>>> reinvention > > of > > >>>> whole of join handling in extension side, and (3b) unnecessary > > extension > > >>>> calls towards the case obviously unsupported. > > >>>> > > >>>> So, I'd like to start the (2) portion towards the upcoming 1st > > >>>> commit-fest on the v9.5 development cycle. Also, we will be able > > >>>> to have discussion for the (3) portion concurrently, probably, > > >>>> towards > > 2nd commit-fest. > > >>>> > > >>>> Unfortunately, I cannot participate PGcon/Ottawa this year. > > >>>> Please share us the face-to-face discussion here. > > >>>> > > >>>> Thanks, > > >>>> -- > > >>>> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > >>>> <kaigai@ak.jp.nec.com> > > >>>> > > >>> -- > > >>> KaiGai Kohei <kaigai@kaigai.gr.jp> > > >> > > >> > > >> > > >> -- > > >> Shigeru HANADA > > > > > > > > > > > > -- > > > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > > > > > -- > > Shigeru HANADA
Attachment
Kaigai-san, The v3 patch had conflict in src/backend/tcop/utility.c for newly added IMPORT FOREIGN SCHEMA statement, but it was trivial. 2014-07-08 20:55 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > * System catalog name was changed to pg_custom_plan_provider; > that reflects role of the object being defined. ISTM that doc/src/sgml/custom-plan.sgml should be also renamed to custom-plan-provider.sgml. > * Also, prefix of its variable names are changed to "cpp"; that > means custom-plan-provider. A "custclass" remains in a comment in src/include/catalog/pg_custom_plan_provider.h. > * Syntax also reflects what the command does more. New syntax to > define custom plan provider is: > CREATE CUSTOM PLAN PROVIDER <cpp_name> > FOR <cpp_class> HANDLER <cpp_function>; > * Add custom-plan.sgml to introduce interface functions defined > for path/plan/exec methods. > * FinalizeCustomPlan() callback was simplified to support scan > (and join in the future) at the starting point. As long as > scan/join requirement, no need to control paramids bitmap in > arbitrary way. > * Unnecessary forward declaration in relation.h and plannode.h > were removed, but a few structures still needs to have > forward declarations. > * Fix typos being pointed out. Check. I found some typos and a wording "datatype" which is not used in any other place. Please refer the attached patch. -- Shigeru HANADA
Attachment
Hanada-san, Thanks for your checking. The attached v4 patch is rebased one on the latest master branch. Indeed, merge conflict was trivial. Updates from the v3 are below: - custom-plan.sgml was renamed to custom-plan-provider.sgml - fix up the comments in pg_custom_plan_provider.h that mentioned about old field name. - applied your patch to fix up typos. (thanks so much!) - put "costs off" on the EXPLAIN command in the regression test of ctidscan extension. Nothing to comment on the design and implementation from your viewpoint any more? Thanks, 2014-07-14 19:07 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: > Kaigai-san, > > The v3 patch had conflict in src/backend/tcop/utility.c for newly > added IMPORT FOREIGN SCHEMA statement, but it was trivial. > > 2014-07-08 20:55 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >> * System catalog name was changed to pg_custom_plan_provider; >> that reflects role of the object being defined. > > ISTM that doc/src/sgml/custom-plan.sgml should be also renamed to > custom-plan-provider.sgml. > >> * Also, prefix of its variable names are changed to "cpp"; that >> means custom-plan-provider. > > A "custclass" remains in a comment in > src/include/catalog/pg_custom_plan_provider.h. > >> * Syntax also reflects what the command does more. New syntax to >> define custom plan provider is: >> CREATE CUSTOM PLAN PROVIDER <cpp_name> >> FOR <cpp_class> HANDLER <cpp_function>; >> * Add custom-plan.sgml to introduce interface functions defined >> for path/plan/exec methods. >> * FinalizeCustomPlan() callback was simplified to support scan >> (and join in the future) at the starting point. As long as >> scan/join requirement, no need to control paramids bitmap in >> arbitrary way. >> * Unnecessary forward declaration in relation.h and plannode.h >> were removed, but a few structures still needs to have >> forward declarations. >> * Fix typos being pointed out. > > Check. I found some typos and a wording "datatype" which is not used > in any other place. Please refer the attached patch. > > -- > Shigeru HANADA -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Kaigai-san, 2014-07-14 22:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Hanada-san, > > Thanks for your checking. The attached v4 patch is rebased one on the > latest master branch. Indeed, merge conflict was trivial. > > Updates from the v3 are below: > - custom-plan.sgml was renamed to custom-plan-provider.sgml > - fix up the comments in pg_custom_plan_provider.h that mentioned > about old field name. > - applied your patch to fix up typos. (thanks so much!) > - put "costs off" on the EXPLAIN command in the regression test of > ctidscan extension. Checked, but the patch fails sanity-check test, you need to modify expected file of the test. > Nothing to comment on the design and implementation from your > viewpoint any more? As much as I can tell, the design seems reasonable. After fix for the small issue above, I'll move the patch status to "Ready for committer". -- Shigeru HANADA
> 2014-07-14 22:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > Hanada-san, > > > > Thanks for your checking. The attached v4 patch is rebased one on the > > latest master branch. Indeed, merge conflict was trivial. > > > > Updates from the v3 are below: > > - custom-plan.sgml was renamed to custom-plan-provider.sgml > > - fix up the comments in pg_custom_plan_provider.h that mentioned > > about old field name. > > - applied your patch to fix up typos. (thanks so much!) > > - put "costs off" on the EXPLAIN command in the regression test of > > ctidscan extension. > > Checked, but the patch fails sanity-check test, you need to modify expected > file of the test. > Sorry, expected result of sanity-check test was not updated on renaming to pg_custom_plan_provider. The attached patch fixed up this point. > > Nothing to comment on the design and implementation from your > > viewpoint any more? > > As much as I can tell, the design seems reasonable. After fix for the small > issue above, I'll move the patch status to "Ready for committer". > -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Kaigai-san, 2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Sorry, expected result of sanity-check test was not updated on > renaming to pg_custom_plan_provider. > The attached patch fixed up this point. I confirmed that all regression tests passed, so I marked the patch as "Ready for committer". -- Shigeru HANADA
On 2014-07-16 10:43:08 +0900, Shigeru Hanada wrote: > Kaigai-san, > > 2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > Sorry, expected result of sanity-check test was not updated on > > renaming to pg_custom_plan_provider. > > The attached patch fixed up this point. > > I confirmed that all regression tests passed, so I marked the patch as > "Ready for committer". I personally don't see how this patch is 'ready for committer'. I realize that that state is sometimes used to denote that review needs to be "escalated", but it still seemspremature. Unless I miss something there hasn't been any API level review of this? Also, aren't there several open items? Perhaps there needs to be a stage between 'needs review' and 'ready for committer'? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I haven't followed this at all, but I just skimmed over it and noticed the CustomPlanMarkPos thingy; apologies if this has been discussed before. It seems a bit odd to me; why isn't it sufficient to have a boolean flag in regular CustomPlan to indicate that it supports mark/restore? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I haven't followed this at all, but I just skimmed over it and noticed > the CustomPlanMarkPos thingy; apologies if this has been discussed > before. It seems a bit odd to me; why isn't it sufficient to have a > boolean flag in regular CustomPlan to indicate that it supports > mark/restore? Yeah, I thought that was pretty bogus too, but it's well down the list of issues that were there last time I looked at this ... regards, tom lane
> I personally don't see how this patch is 'ready for committer'. I realize > that that state is sometimes used to denote that review needs to be > "escalated", but it still seemspremature. > > Unless I miss something there hasn't been any API level review of this? > Also, aren't there several open items? > Even though some interface specifications are revised according to the comment from Tom on the last development cycle, the current set of interfaces are not reviewed by committers. I really want this. Here are two open items that we want to wait for committers comments. * Whether set_cheapest() is called for all relkind? This pactch moved set_cheapest() to the end of set_rel_pathlist(), to consolidate entrypoint of custom-plan-provider handler function. It also implies CPP can provider alternative paths towards non-regular relations (like sub-queries, functions, ...). Hanada-san wonder whether we really have a case to run alternative sub-query code. Even though I don't have usecases for alternative sub-query execution logic, but we also don't have a reason why not to restrict it. * How argument of add_path handler shall be derivered? The handler function (that adds custom-path to the required relation scan if it can provide) is declared with an argument with INTERNAL data type. Extension needs to have type-cast on the supplied pointer to customScanArg data-type (or potentially customHashJoinArg and so on...) according to the custom plan class. I think it is well extendable design than strict argument definitions, but Hanada-san wonder whether it is the best design. > Perhaps there needs to be a stage between 'needs review' and 'ready for > committer'? > It needs clarification of 'ready for committer'. I think interface specification is a kind of task to be discussed with committers because preference/viewpoint of rr-reviewer are not always same opinion with them. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Andres Freund [mailto:andres@2ndquadrant.com] > Sent: Friday, July 18, 2014 3:12 AM > To: Shigeru Hanada > Cc: Kaigai Kouhei(海外 浩平); Kohei KaiGai; Simon Riggs; Tom Lane; Stephen > Frost; Robert Haas; PgHacker; Jim Mlodgenski; Peter Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On 2014-07-16 10:43:08 +0900, Shigeru Hanada wrote: > > Kaigai-san, > > > > 2014-07-15 21:37 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > > Sorry, expected result of sanity-check test was not updated on > > > renaming to pg_custom_plan_provider. > > > The attached patch fixed up this point. > > > > I confirmed that all regression tests passed, so I marked the patch as > > "Ready for committer". > > I personally don't see how this patch is 'ready for committer'. I realize > that that state is sometimes used to denote that review needs to be > "escalated", but it still seemspremature. > > Unless I miss something there hasn't been any API level review of this? > Also, aren't there several open items? > > Perhaps there needs to be a stage between 'needs review' and 'ready for > committer'? > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
> Alvaro Herrera <alvherre@2ndquadrant.com> writes: > > I haven't followed this at all, but I just skimmed over it and noticed > > the CustomPlanMarkPos thingy; apologies if this has been discussed > > before. It seems a bit odd to me; why isn't it sufficient to have a > > boolean flag in regular CustomPlan to indicate that it supports > > mark/restore? > > Yeah, I thought that was pretty bogus too, but it's well down the list of > issues that were there last time I looked at this ... > IIRC, CustomPlanMarkPos was suggested to keep the interface of ExecSupportsMarkRestore() that takes plannode tag to determine whether it support Mark/Restore. As my original proposition did, it seems to me a flag field in CustomPlan structure is straightforward, if we don't hesitate to change ExecSupportsMarkRestore(). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
2014-07-18 10:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> > I haven't followed this at all, but I just skimmed over it and noticed >> > the CustomPlanMarkPos thingy; apologies if this has been discussed >> > before. It seems a bit odd to me; why isn't it sufficient to have a >> > boolean flag in regular CustomPlan to indicate that it supports >> > mark/restore? >> >> Yeah, I thought that was pretty bogus too, but it's well down the list of >> issues that were there last time I looked at this ... >> > IIRC, CustomPlanMarkPos was suggested to keep the interface of > ExecSupportsMarkRestore() that takes plannode tag to determine > whether it support Mark/Restore. > As my original proposition did, it seems to me a flag field in > CustomPlan structure is straightforward, if we don't hesitate to > change ExecSupportsMarkRestore(). > The attached patch revised the above point. It eliminates CustomPlanMarkPos, and adds flags field on CustomXXX structure to inform the backend whether the custom plan provider can support mark-restore position and backward scan. This change requires ExecSupportsMarkRestore() to reference contents of Path node, not only node-tag, so its declaration was also changed to take a pointer to Path node. The only caller of this function is final_cost_mergejoin() right now. It just gives pathtype field of Path node on its invocation, so this change does not lead significant degradation. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
The attached patch is the rebased custom-plan api, without any functional changes from the latest version; that added a flag field to custom-plan node to show whether it support mark/restore or backward-scan. Towards the upcoming commit-fest, let me summarize the brief overview of this patch. The purpose of custom-plan interface, implemented with this patch, is to allow extensions to provide alternative way to scan (and potentially joining and so on) relation, in addition to the built-in logic. If one or more extensions are installed as custom-plan provider, it can tell the planner alternative way to scan a relation using CustomPath node with cost estimation. As usual manner, the planner will chose a particular path based on the cost. If custom one would not be chosen, it's gone and nothing different. Once a custom-plan gets chosen, the custom-plan provider that performs on behalf of the custom-plan node shall be invoked during query execution. It is responsible to scan the relation by its own way. One expected usage of this interface is GPU acceleration that I'm working also. The custom-plan provider shall be invoked via the function being installed as custom- plan provider, with an argument that packs all the necessary information to construct a custom-path node. In case of relation scan, customScanArg that contains PlannerInfo, RelOptInfo and RangeTblEntry shall be informed. The function is registered using a new command: CREATE CUSTOM PLAN PROVIDER <name> FOR SCAN HANDLER <handler_funtion>; According to the discussion before, CustomXXX node is designed to have private fields of extension like a manner of object oriented language. CustomXXX node has a few common and minimum required fields, but no private pointer. Extension declares its own Path/Plan/PlanState structure that inherits CustomXXX node on the head of structure declaration, but not all. It can have private fields on the later half of the structure. The contrib/ctidscan is a good example to see how extension can utilize the interface. Once a CustomPlan/PlanState node is constructed, the rest of processes are what other executor-nodes are doing. It shall be invoked at beginning, ending and running of the executor, then callback function in the table of function pointers shall be called. Thanks, 2014-07-23 10:47 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2014-07-18 10:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >>> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> > I haven't followed this at all, but I just skimmed over it and noticed >>> > the CustomPlanMarkPos thingy; apologies if this has been discussed >>> > before. It seems a bit odd to me; why isn't it sufficient to have a >>> > boolean flag in regular CustomPlan to indicate that it supports >>> > mark/restore? >>> >>> Yeah, I thought that was pretty bogus too, but it's well down the list of >>> issues that were there last time I looked at this ... >>> >> IIRC, CustomPlanMarkPos was suggested to keep the interface of >> ExecSupportsMarkRestore() that takes plannode tag to determine >> whether it support Mark/Restore. >> As my original proposition did, it seems to me a flag field in >> CustomPlan structure is straightforward, if we don't hesitate to >> change ExecSupportsMarkRestore(). >> > The attached patch revised the above point. > It eliminates CustomPlanMarkPos, and adds flags field on CustomXXX > structure to inform the backend whether the custom plan provider can > support mark-restore position and backward scan. > This change requires ExecSupportsMarkRestore() to reference > contents of Path node, not only node-tag, so its declaration was also > changed to take a pointer to Path node. > The only caller of this function is final_cost_mergejoin() right now. > It just gives pathtype field of Path node on its invocation, so this change > does not lead significant degradation. > > Thanks, > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> I haven't followed this at all, but I just skimmed over it and noticed >> the CustomPlanMarkPos thingy; apologies if this has been discussed >> before. It seems a bit odd to me; why isn't it sufficient to have a >> boolean flag in regular CustomPlan to indicate that it supports >> mark/restore? > > Yeah, I thought that was pretty bogus too, but it's well down the > list of issues that were there last time I looked at this ... I think the threshold question for this incarnation of the patch is whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as a way of installing new plan providers into the database. If we are, then I can go ahead and enumerate a long list of things that will need to be fixed to make that code acceptable (such as adding pg_dump support). But if we're not, there's no point in spending any time on that part of the patch. I can see a couple of good reasons to think that this approach might be reasonable: - In some ways, a custom plan provider (really, at this point, a custom scan provider) is very similar to a foreign data wrapper. To the guts of PostgreSQL, an FDW is a sort of black box that knows how to scan some data not managed by PostgreSQL. A custom plan provider is similar, except that it scans data that *is* managed by PostgreSQL. - There's also some passing resemblance between a custom plan provider and an access method. Access methods provide a set of APIs for fast access to data via an index, while custom plan providers provide an API for fast access to data via magic that the core system doesn't (and need not) understand. While access methods don't have associated SQL syntax, they do include catalog structure, so perhaps this should too, by analogy. All that having been said, I'm having a hard time mustering up enthusiasm for this way of doing things. As currently constituted, the pg_custom_plan_provider catalog contains only a name, a "class" that is always 's' for scan, and a handler function OID. Quite frankly, that's a whole lot of nothing. If we got rid of the pg_catalog structure and just had something like RegisterCustomPlanProvider(char *name, void (*)(customScanArg *), which could be invoked from _PG_init(), hundreds and hundreds of lines of code could go away and we wouldn't lose any actual functionality; you'd just list your custom plan providers in shared_preload_libraries or local_preload_libraries instead of listing them in a system catalog. In fact, you might even have more functionality, because you could load providers into particular sessions rather than system-wide, which isn't possible with this design. I think the underlying issue here really has to do with when custom plan providers get invoked - what triggers that? For foreign data wrappers, we have some relations that are plain tables (relkind = 'r') and no foreign data wrapper code is invoked. We have others that are flagged as foreign tables (relkind = 'f') and for those we look up the matching FDW (via ftserver) and run the code. Similarly, for an index AM, we notice that the relation is an index (relkind = 'r') and then consult relam to figure out which index AM we should invoke. But as KaiGai is conceiving this feature, it's quite different. Rather than applying only to particular relations, and being mutually exclusive with other options that might apply to those relations, it applies to *everything* in the database in addition to whatever other options may be present. The included ctidscan implementation is just as good an example as PG-Strom: you inspect the query and see, based on the operators present, whether there's any hope of accelerating things. In other words, there's no user configuration - and also, not irrelevantly, no persistent on-disk state the way you have for an index, or even an FDW, which has on disk state to the extent that there have to be catalog entries tying a particular FDW to a particular table. A lot of the previous discussion of this topic revolves around the question of whether we can unify the use case that this patch is targeting with other things - e.g. Citus's desire to store its data files within the data directory while retaining control over data access, thus not a perfect fit for FDWs; the desire to push joins down to foreign servers; more generally, the desire to replace a join with a custom plan that may or may not use access paths for the underlying relations as subpaths. I confess I'm not seeing a whole lot of commonality with anything other than the custom-join-path idea, which probably shares many of what I believe to be the relevant characteristics of a custom scan as conceived by KaiGai: namely, that all of the decisions about whether to inject a custom path in particular circumstances are left up to the provider itself based on inspection of the specific query, rather than being a result of user configuration. So, I'm tentatively in favor of stripping the DDL support out of this patch and otherwise trying to boil it down to something that's really minimal, but I'd like to hear what others think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think the threshold question for this incarnation of the patch is > whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as > a way of installing new plan providers into the database. I tend to agree with your conclusion that that's a whole lot of infrastructure with very little return. I don't see anything here we shouldn't do via function hooks instead, and/or a "register" callback from a dynamically loaded library. Also, we tend to think (for good reason) that once something is embedded at the SQL level it's frozen; we are much more willing to redesign C-level APIs. There is no possible way that it's a good idea for this stuff to get frozen in its first iteration. regards, tom lane
2014-08-23 0:39 GMT+09:00 Robert Haas <robertmhaas@gmail.com>: > On Thu, Jul 17, 2014 at 3:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >>> I haven't followed this at all, but I just skimmed over it and noticed >>> the CustomPlanMarkPos thingy; apologies if this has been discussed >>> before. It seems a bit odd to me; why isn't it sufficient to have a >>> boolean flag in regular CustomPlan to indicate that it supports >>> mark/restore? >> >> Yeah, I thought that was pretty bogus too, but it's well down the >> list of issues that were there last time I looked at this ... > > I think the threshold question for this incarnation of the patch is > whether we're happy with new DDL (viz, CREATE CUSTOM PLAN PROVIDER) as > a way of installing new plan providers into the database. If we are, > then I can go ahead and enumerate a long list of things that will need > to be fixed to make that code acceptable (such as adding pg_dump > support). But if we're not, there's no point in spending any time on > that part of the patch. > Even though I'm patch author, I'd like to agree with this approach. In fact, the previous custom-plan interface I proposed to the v9.4 development cycle does not include DDL support to register the custom-plan providers, however, it works fine. One thing I was pointed out, it is the reason why I implemented DDL support, is that intermediation of c-language function also loads the extension module implicitly. It is an advantage, but not sure whether it shall be supported from the beginning. > I can see a couple of good reasons to think that this approach might > be reasonable: > > - In some ways, a custom plan provider (really, at this point, a > custom scan provider) is very similar to a foreign data wrapper. To > the guts of PostgreSQL, an FDW is a sort of black box that knows how > to scan some data not managed by PostgreSQL. A custom plan provider > is similar, except that it scans data that *is* managed by PostgreSQL. > > - There's also some passing resemblance between a custom plan provider > and an access method. Access methods provide a set of APIs for fast > access to data via an index, while custom plan providers provide an > API for fast access to data via magic that the core system doesn't > (and need not) understand. While access methods don't have associated > SQL syntax, they do include catalog structure, so perhaps this should > too, by analogy. > > All that having been said, I'm having a hard time mustering up > enthusiasm for this way of doing things. As currently constituted, > the pg_custom_plan_provider catalog contains only a name, a "class" > that is always 's' for scan, and a handler function OID. Quite > frankly, that's a whole lot of nothing. If we got rid of the > pg_catalog structure and just had something like > RegisterCustomPlanProvider(char *name, void (*)(customScanArg *), > which could be invoked from _PG_init(), hundreds and hundreds of lines > of code could go away and we wouldn't lose any actual functionality; > you'd just list your custom plan providers in shared_preload_libraries > or local_preload_libraries instead of listing them in a system > catalog. In fact, you might even have more functionality, because you > could load providers into particular sessions rather than system-wide, > which isn't possible with this design. > Indeed. It's an advantage of the approach without system catalog. > I think the underlying issue here really has to do with when custom > plan providers get invoked - what triggers that? For foreign data > wrappers, we have some relations that are plain tables (relkind = 'r') > and no foreign data wrapper code is invoked. We have others that are > flagged as foreign tables (relkind = 'f') and for those we look up the > matching FDW (via ftserver) and run the code. Similarly, for an index > AM, we notice that the relation is an index (relkind = 'r') and then > consult relam to figure out which index AM we should invoke. But as > KaiGai is conceiving this feature, it's quite different. Rather than > applying only to particular relations, and being mutually exclusive > with other options that might apply to those relations, it applies to > *everything* in the database in addition to whatever other options may > be present. The included ctidscan implementation is just as good an > example as PG-Strom: you inspect the query and see, based on the > operators present, whether there's any hope of accelerating things. > In other words, there's no user configuration - and also, not > irrelevantly, no persistent on-disk state the way you have for an > index, or even an FDW, which has on disk state to the extent that > there have to be catalog entries tying a particular FDW to a > particular table. > Yes, that's my point. In case of FDW or index AM, the query planner can have some expectations how relevant executor node will handle the given relation scan according to the persistent state. However, custom-plan is a black-box to the query planner, and it cannot have expectation of how relation scan is handled, except for the cost value estimated by custom-plan provider. Thus, this interface is designed to invoke custom-plan providers being registered on relation scan, to ask them whether it can offer alternative way to scan. Probably, it may have a shortcut to skip invocation in case when custom- plan provider obviously cannot provide any alternative plan. For example, we may add a flag to RegisterCustomPlanProvider() to tell this custom-plan provider works on only relkind='r', thus no need to invoke on other relation types. > A lot of the previous discussion of this topic revolves around the > question of whether we can unify the use case that this patch is > targeting with other things - e.g. Citus's desire to store its data > files within the data directory while retaining control over data > access, thus not a perfect fit for FDWs; the desire to push joins down > to foreign servers; more generally, the desire to replace a join with > a custom plan that may or may not use access paths for the underlying > relations as subpaths. I confess I'm not seeing a whole lot of > commonality with anything other than the custom-join-path idea, which > probably shares many of what I believe to be the relevant > characteristics of a custom scan as conceived by KaiGai: namely, that > all of the decisions about whether to inject a custom path in > particular circumstances are left up to the provider itself based on > inspection of the specific query, rather than being a result of user > configuration. > > So, I'm tentatively in favor of stripping the DDL support out of this > patch and otherwise trying to boil it down to something that's really > minimal, but I'd like to hear what others think. > I'd like to follow this direction, and start stripping the DDL support. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Fri, Aug 22, 2014 at 9:48 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > One thing I was pointed out, it is the reason why I implemented > DDL support, is that intermediation of c-language function also > loads the extension module implicitly. It is an advantage, but > not sure whether it shall be supported from the beginning. That is definitely an advantage of the DDL-based approach, but I think it's too much extra machinery for not enough real advantage. Sounds like we all agree, so ... > I'd like to follow this direction, and start stripping the DDL support. ...please make it so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > I'd like to follow this direction, and start stripping the DDL support. > > ...please make it so. > The attached patch eliminates DDL support. Instead of the new CREATE CUSTOM PLAN PROVIDER statement, it adds an internal function; register_custom_scan_provider that takes custom plan provider name and callback function to add alternative scan path (should have a form of CustomPath) during the query planner is finding out the cheapest path to scan the target relation. Also, documentation stuff is revised according to the latest design. Any other stuff keeps the previous design. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas > Sent: Tuesday, August 26, 2014 1:09 PM > To: Kohei KaiGai > Cc: Tom Lane; Alvaro Herrera; Kaigai Kouhei(海外 浩平); Shigeru Hanada; > Simon Riggs; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter > Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On Fri, Aug 22, 2014 at 9:48 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > One thing I was pointed out, it is the reason why I implemented DDL > > support, is that intermediation of c-language function also loads the > > extension module implicitly. It is an advantage, but not sure whether > > it shall be supported from the beginning. > > That is definitely an advantage of the DDL-based approach, but I think it's > too much extra machinery for not enough real advantage. Sounds like we > all agree, so ... > > > I'd like to follow this direction, and start stripping the DDL support. > > ...please make it so. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> > I'd like to follow this direction, and start stripping the DDL support. >> >> ...please make it so. >> > The attached patch eliminates DDL support. > > Instead of the new CREATE CUSTOM PLAN PROVIDER statement, > it adds an internal function; register_custom_scan_provider > that takes custom plan provider name and callback function > to add alternative scan path (should have a form of CustomPath) > during the query planner is finding out the cheapest path to > scan the target relation. > Also, documentation stuff is revised according to the latest > design. > Any other stuff keeps the previous design. Comments: 1. There seems to be no reason for custom plan nodes to have MultiExec support; I think this as an area where extensibility is extremely unlikely to work out. The MultiExec mechanism is really only viable between closely-cooperating nodes, like Hash and HashJoin, or BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably those things could have been written as a single, more complex node. Are we really going to want to support a custom plan that can substitute for a Hash or BitmapAnd node? I really doubt that's very useful. 2. This patch is still sort of on the fence about whether we're implementing custom plans (of any type) or custom scans (thus, of some particular relation). I previously recommended that we confine ourselves initially to the task of adding custom *scans* and leave the question of other kinds of custom plan nodes to a future patch. After studying the latest patch, I'm inclined to suggest a slightly revised strategy. This patch is really adding THREE kinds of custom objects: CustomPlanState, CustomPlan, and CustomPath. CustomPlanState inherits from ScanState, so it is not really a generic CustomPlan, but specifically a CustomScan; likewise, CustomPlan inherits from Scan, and is therefore a CustomScan, not a CustomPlan. But CustomPath is different: it's just a Path. Even if we only have the hooks to inject CustomPaths that are effectively scans at this point, I think that part of the infrastructure could be somewhat generic. Perhaps eventually we have CustomPath which can generate either CustomScan or CustomJoin which in turn could generate CustomScanState and CustomJoinState. For now, I propose that we rename CustomPlan and CustomPlanState to CustomScan and CustomScanState, because that's what they are; but that we leave CustomPath as-is. For ease of review, I also suggest splitting this into a series of three patches: (1) add support for CustomPath; (2) add support for CustomScan and CustomScanState; (3) ctidscan. 3. Is it really a good idea to invoke custom scan providers for RTEs of every type? It's pretty hard to imagine that a custom scan provider can do anything useful with, say, RTE_VALUES. Maybe an accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but even that feels like an awfully big stretch. At least until clear use cases emerge, I'd be inclined to restrict this to RTE_RELATION scans where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in set_plain_rel_pathlist() rather than set_rel_pathlist(). (We might even want to consider whether the hook in set_plain_rel_pathlist() ought to be allowed to inject a non-custom plan; e.g. substitute a scan of relation B for a scan of relation A. For example, imagine that B contains all rows from A that satisfy some predicate. This could even be useful for foreign tables; e.g. substitute a scan of a local copy of a foreign table for a reference to that table. But I put all of these ideas in parentheses because they're only good ideas to the extent that they don't sidetrack us too much.) 4. Department of minor nitpicks. You've got a random 'xs' in the comments for ExecSupportsBackwardScan. And, in contrib/ctidscan, ctidscan_path_methods, ctidscan_plan_methods, and ctidscan_exec_methods can have static initializers; there's no need to initialize them at run time in _PG_init(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas@gmail.com>: > On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> > I'd like to follow this direction, and start stripping the DDL support. >>> >>> ...please make it so. >>> >> The attached patch eliminates DDL support. >> >> Instead of the new CREATE CUSTOM PLAN PROVIDER statement, >> it adds an internal function; register_custom_scan_provider >> that takes custom plan provider name and callback function >> to add alternative scan path (should have a form of CustomPath) >> during the query planner is finding out the cheapest path to >> scan the target relation. >> Also, documentation stuff is revised according to the latest >> design. >> Any other stuff keeps the previous design. > > Comments: > > 1. There seems to be no reason for custom plan nodes to have MultiExec > support; I think this as an area where extensibility is extremely > unlikely to work out. The MultiExec mechanism is really only viable > between closely-cooperating nodes, like Hash and HashJoin, or > BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably > those things could have been written as a single, more complex node. > Are we really going to want to support a custom plan that can > substitute for a Hash or BitmapAnd node? I really doubt that's very > useful. > This intends to allows a particular custom-scan provider to exchange its internal data when multiple custom-scan node is stacked. So, it can be considered a facility to implement closely-cooperating nodes; both of them are managed by same custom-scan provider. An example is gpu-accelerated version of hash-join that takes underlying custom-scan node that will returns a hash table with gpu preferable data structure, but should not be a part of row-by-row interface. I believe it is valuable for some use cases, even though I couldn't find a use-case in ctidscan example. > 2. This patch is still sort of on the fence about whether we're > implementing custom plans (of any type) or custom scans (thus, of some > particular relation). I previously recommended that we confine > ourselves initially to the task of adding custom *scans* and leave the > question of other kinds of custom plan nodes to a future patch. After > studying the latest patch, I'm inclined to suggest a slightly revised > strategy. This patch is really adding THREE kinds of custom objects: > CustomPlanState, CustomPlan, and CustomPath. CustomPlanState inherits > from ScanState, so it is not really a generic CustomPlan, but > specifically a CustomScan; likewise, CustomPlan inherits from Scan, > and is therefore a CustomScan, not a CustomPlan. But CustomPath is > different: it's just a Path. Even if we only have the hooks to inject > CustomPaths that are effectively scans at this point, I think that > part of the infrastructure could be somewhat generic. Perhaps > eventually we have CustomPath which can generate either CustomScan or > CustomJoin which in turn could generate CustomScanState and > CustomJoinState. > Suggestion seems to me reasonable. The reason why CustomPlanState inheris ScanState and CustomPlan inherits Scan is, just convenience for implementation of extensions. Some useful internal APIs, like ExecScan(), takes argument of ScanState, so it was a better strategy to choose Scan/ScanState instead of the bare Plan/PlanState. Anyway, I'd like to follow the perspective that looks CustomScan as one derivative from the CustomPath. It is more flexible. > For now, I propose that we rename CustomPlan and CustomPlanState to > CustomScan and CustomScanState, because that's what they are; but that > we leave CustomPath as-is. For ease of review, I also suggest > splitting this into a series of three patches: (1) add support for > CustomPath; (2) add support for CustomScan and CustomScanState; (3) > ctidscan. > OK, I'll do that. > 3. Is it really a good idea to invoke custom scan providers for RTEs > of every type? It's pretty hard to imagine that a custom scan > provider can do anything useful with, say, RTE_VALUES. Maybe an > accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but > even that feels like an awfully big stretch. At least until clear use > cases emerge, I'd be inclined to restrict this to RTE_RELATION scans > where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in > set_plain_rel_pathlist() rather than set_rel_pathlist(). > I'd like to agree. Indeed, it's not easy to assume a use case of custom-logic for non-plain relations. > (We might even want to consider whether the hook in > set_plain_rel_pathlist() ought to be allowed to inject a non-custom > plan; e.g. substitute a scan of relation B for a scan of relation A. > For example, imagine that B contains all rows from A that satisfy some > predicate. This could even be useful for foreign tables; e.g. > substitute a scan of a local copy of a foreign table for a reference > to that table. But I put all of these ideas in parentheses because > they're only good ideas to the extent that they don't sidetrack us too > much.) > Hmm... It seems to me we need another infrastructure to take a substitute scan, because add_path() is called towards a certain RelOpInfo that is associated with the relation A. As long as custom-scan provider "internally" redirect a request for scan of A by substitute scan B (with taking care of all other stuff like relation locks), I don't think we need to put some other hooks outside from the set_plain_rel_pathlist(). > 4. Department of minor nitpicks. You've got a random 'xs' in the > comments for ExecSupportsBackwardScan. > Sorry, I didn't type 'ctrl' well when I saved the source code on emacs... > And, in contrib/ctidscan, > ctidscan_path_methods, ctidscan_plan_methods, and > ctidscan_exec_methods can have static initializers; there's no need to > initialize them at run time in _PG_init(). > It came from the discussion I had long time before during patch reviewing of postgres_fdw. I suggested to use static table of FdwRoutine but I got a point that says some compiler raise error/warning to put function pointers on static initialization. I usually use GCC only, so I'm not sure whether this argue is right or not, even though the postgres_fdw_handler() allocates FdwRoutine using palloc() then put function pointers for each. Anyway, I'll start to revise the patch according to the comments 2, 3 and first half of 4. Also, I'd like to see the comments regarding to the 1 and later half of 4. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Sun, Aug 31, 2014 at 12:54 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > 2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas@gmail.com>: >> On Wed, Aug 27, 2014 at 6:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>>> > I'd like to follow this direction, and start stripping the DDL support. >>>> >>>> ...please make it so. >>>> >>> The attached patch eliminates DDL support. >>> >>> Instead of the new CREATE CUSTOM PLAN PROVIDER statement, >>> it adds an internal function; register_custom_scan_provider >>> that takes custom plan provider name and callback function >>> to add alternative scan path (should have a form of CustomPath) >>> during the query planner is finding out the cheapest path to >>> scan the target relation. >>> Also, documentation stuff is revised according to the latest >>> design. >>> Any other stuff keeps the previous design. >> >> Comments: >> >> 1. There seems to be no reason for custom plan nodes to have MultiExec >> support; I think this as an area where extensibility is extremely >> unlikely to work out. The MultiExec mechanism is really only viable >> between closely-cooperating nodes, like Hash and HashJoin, or >> BitmapIndexScan, BitmapAnd, BitmapOr, and BitmapHeapScan; and arguably >> those things could have been written as a single, more complex node. >> Are we really going to want to support a custom plan that can >> substitute for a Hash or BitmapAnd node? I really doubt that's very >> useful. >> > This intends to allows a particular custom-scan provider to exchange > its internal data when multiple custom-scan node is stacked. > So, it can be considered a facility to implement closely-cooperating nodes; > both of them are managed by same custom-scan provider. > An example is gpu-accelerated version of hash-join that takes underlying > custom-scan node that will returns a hash table with gpu preferable data > structure, but should not be a part of row-by-row interface. > I believe it is valuable for some use cases, even though I couldn't find > a use-case in ctidscan example. Color me skeptical. Please remove that part for now, and we can revisit it when, and if, a plausible use case emerges. >> 3. Is it really a good idea to invoke custom scan providers for RTEs >> of every type? It's pretty hard to imagine that a custom scan >> provider can do anything useful with, say, RTE_VALUES. Maybe an >> accelerated scan of RTE_CTE or RTE_SUBQUERY is practical somehow, but >> even that feels like an awfully big stretch. At least until clear use >> cases emerge, I'd be inclined to restrict this to RTE_RELATION scans >> where rte->relkind != RELKIND_FOREIGN_TABLE; that is, put the logic in >> set_plain_rel_pathlist() rather than set_rel_pathlist(). >> > I'd like to agree. Indeed, it's not easy to assume a use case of > custom-logic for non-plain relations. > >> (We might even want to consider whether the hook in >> set_plain_rel_pathlist() ought to be allowed to inject a non-custom >> plan; e.g. substitute a scan of relation B for a scan of relation A. >> For example, imagine that B contains all rows from A that satisfy some >> predicate. This could even be useful for foreign tables; e.g. >> substitute a scan of a local copy of a foreign table for a reference >> to that table. But I put all of these ideas in parentheses because >> they're only good ideas to the extent that they don't sidetrack us too >> much.) >> > Hmm... It seems to me we need another infrastructure to take > a substitute scan, because add_path() is called towards a certain > RelOpInfo that is associated with the relation A. > As long as custom-scan provider "internally" redirect a request for > scan of A by substitute scan B (with taking care of all other stuff > like relation locks), I don't think we need to put some other hooks > outside from the set_plain_rel_pathlist(). OK, I see. So this would have to be implemented as some new kind of path anyway. It might be worth allowing custom paths for scanning a foreign table as well as a plain table, though - so any RTE_RELATION but not other types of RTE. > It came from the discussion I had long time before during patch > reviewing of postgres_fdw. I suggested to use static table of > FdwRoutine but I got a point that says some compiler raise > error/warning to put function pointers on static initialization. > I usually use GCC only, so I'm not sure whether this argue is > right or not, even though the postgres_fdw_handler() allocates > FdwRoutine using palloc() then put function pointers for each. That's odd, because aset.c has used static initializers since forever, and I'm sure someone would have complained by now if there were a problem with that usage. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Sun, Aug 31, 2014 at 12:54 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > 2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas@gmail.com>: > >> Comments: > >> > >> 1. There seems to be no reason for custom plan nodes to have > >> MultiExec support; I think this as an area where extensibility is > >> extremely unlikely to work out. The MultiExec mechanism is really > >> only viable between closely-cooperating nodes, like Hash and > >> HashJoin, or BitmapIndexScan, BitmapAnd, BitmapOr, and > >> BitmapHeapScan; and arguably those things could have been written as > a single, more complex node. > >> Are we really going to want to support a custom plan that can > >> substitute for a Hash or BitmapAnd node? I really doubt that's very > >> useful. > >> > > This intends to allows a particular custom-scan provider to exchange > > its internal data when multiple custom-scan node is stacked. > > So, it can be considered a facility to implement closely-cooperating > > nodes; both of them are managed by same custom-scan provider. > > An example is gpu-accelerated version of hash-join that takes > > underlying custom-scan node that will returns a hash table with gpu > > preferable data structure, but should not be a part of row-by-row > interface. > > I believe it is valuable for some use cases, even though I couldn't > > find a use-case in ctidscan example. > > Color me skeptical. Please remove that part for now, and we can revisit > it when, and if, a plausible use case emerges. > Now, I removed the multi-exec portion from the patch set. Existence of this interface affects to the query execution cost so much, so I want to revisit it as soon as possible. Also see the EXPLAIN output on the tail of this message. > > It came from the discussion I had long time before during patch > > reviewing of postgres_fdw. I suggested to use static table of > > FdwRoutine but I got a point that says some compiler raise > > error/warning to put function pointers on static initialization. > > I usually use GCC only, so I'm not sure whether this argue is right or > > not, even though the postgres_fdw_handler() allocates FdwRoutine using > > palloc() then put function pointers for each. > > That's odd, because aset.c has used static initializers since forever, and > I'm sure someone would have complained by now if there were a problem with > that usage. > I reminded the discussion at that time. The GCC specific manner was not static initialization itself, it was static initialization with field name. Like: static CustomPathMethods ctidscan_path_methods = { .CustomName = "ctidscan", .CreateCustomPlan = CreateCtidScanPlan, .TextOutCustomPath = TextOutCtidScanPath, }; Regarding to the attached three patches: [1] custom-path and hook It adds register_custom_path_provider() interface for registration of custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist to offer alternative scan path on regular relations. I may need to explain the terms in use. I calls the path-node custom-path that is the previous step of population of plan-node (like custom-scan and potentially custom-join and so on). The node object created by CreateCustomPlan() is called custom-plan because it is abstraction for all the potential custom-xxx node; custom-scan is the first of all. [2] custom-scan node It adds custom-scan node support. The custom-scan node is expected to generate contents of a particular relation or sub-plan according to its custom-logic. Custom-scan provider needs to implement callbacks of CustomScanMethods and CustomExecMethods. Once a custom-scan node is populated from custom-path node, the backend calls back these methods in the planning and execution stage. [3] contrib/ctidscan It adds a logic to scan a base relation if WHERE clause contains inequality expression around ctid system column; that allows to skip blocks which will be unread obviously. During the refactoring, I noticed a few interface is omissible. The backend can know which relation is the target of custom-scan node being appeared in the plan-tree if its scanrelid > 0. So, I thought ExplainCustomPlanTargetRel() and ExplainCustomPreScanNode() are omissible, then removed from the patch. Please check the attached ones. -------- Also, regarding to the use-case of multi-exec interface. Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has two sub-plans; GpuScan and MultiHash. GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize multi-exec interface for more efficient data exchange between the nodes. GpuScan already keeps a data structure that is suitable to send to/recv from GPU devices and constructed on the memory segment being DMA available. If we have to form a tuple, pass it via row-by-row interface, then deform it, it will become a major performance degradation in this use case. postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10; QUERY PLAN ----------------------------------------------------------------------------------------------- Custom (GpuHashJoin) (cost=10979.56..90064.15 rows=333 width=49) pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid] hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid)) -> Custom (GpuScan) on t10 (cost=10000.00..88831.26 rows=3333327 width=16) Host References: aid, bid, x Device References: x Device Filter: (x < 10::double precision) -> Custom (MultiHash) (cost=464.56..464.56 rows=1000 width=41) hash keys: aid, bid -> Hash Join (cost=60.06..464.56 rows=1000 width=41) Hash Cond: (t9.data = t8.data) -> Index Scan using t9_pkey on t9 (cost=0.29..357.29 rows=10000 width=37) -> Hash (cost=47.27..47.27 rows=1000 width=37) -> Index Scan using t8_pkey on t8 (cost=0.28..47.27 rows=1000 width=37) Planning time: 0.810 ms (15 rows) -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Regarding to the attached three patches: > [1] custom-path and hook > It adds register_custom_path_provider() interface for registration of > custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist > to offer alternative scan path on regular relations. > I may need to explain the terms in use. I calls the path-node custom-path > that is the previous step of population of plan-node (like custom-scan > and potentially custom-join and so on). The node object created by > CreateCustomPlan() is called custom-plan because it is abstraction for > all the potential custom-xxx node; custom-scan is the first of all. I don't think it's a good thing that add_custom_path_type is declared as void (*)(void *) rather than having a real type. I suggest we add the path-creation callback function to CustomPlanMethods instead, like this: void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel, RangeTblEntry *rte); Then, register_custom_path_provider() can just take CustomPathMethods * as an argument; and create_customscan_paths can just walk the list of CustomPlanMethods objects and call CreateCustomScanPath for each one where that is non-NULL. This conflates the path generation mechanism with the type of path getting generated a little bit, but I don't see any real downside to that. I don't see a reason why you'd ever want two different providers to offer the same type of custompath. Don't the changes to src/backend/optimizer/plan/createplan.c belong in patch #2? > [2] custom-scan node > It adds custom-scan node support. The custom-scan node is expected to > generate contents of a particular relation or sub-plan according to its > custom-logic. > Custom-scan provider needs to implement callbacks of CustomScanMethods > and CustomExecMethods. Once a custom-scan node is populated from > custom-path node, the backend calls back these methods in the planning > and execution stage. It looks to me like this patch is full of holdovers from its earlier life as a more-generic CustomPlan node. In particular, it contains numerous defenses against the case where scanrelid != 0. These are confusingly written as scanrelid > 0, but I think really they're just bogus altogether: if this is specifically a CustomScan, not a CustomPlan, then the relid should always be filled in. Please consider what can be simplified here. The comment in _copyCustomScan looks bogus to me. I think we should *require* a static method table. In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of stuff; } else elog(ERROR, ...). I think it would be clearer to write if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff; > Also, regarding to the use-case of multi-exec interface. > Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has > two sub-plans; GpuScan and MultiHash. > GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize > multi-exec interface for more efficient data exchange between the nodes. > GpuScan already keeps a data structure that is suitable to send to/recv from > GPU devices and constructed on the memory segment being DMA available. > If we have to form a tuple, pass it via row-by-row interface, then deform it, > it will become a major performance degradation in this use case. > > postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10; > QUERY PLAN > ----------------------------------------------------------------------------------------------- > Custom (GpuHashJoin) (cost=10979.56..90064.15 rows=333 width=49) > pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid] > hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid)) > -> Custom (GpuScan) on t10 (cost=10000.00..88831.26 rows=3333327 width=16) > Host References: aid, bid, x > Device References: x > Device Filter: (x < 10::double precision) > -> Custom (MultiHash) (cost=464.56..464.56 rows=1000 width=41) > hash keys: aid, bid > -> Hash Join (cost=60.06..464.56 rows=1000 width=41) > Hash Cond: (t9.data = t8.data) > -> Index Scan using t9_pkey on t9 (cost=0.29..357.29 rows=10000 width=37) > -> Hash (cost=47.27..47.27 rows=1000 width=37) > -> Index Scan using t8_pkey on t8 (cost=0.28..47.27 rows=1000 width=37) > Planning time: 0.810 ms > (15 rows) Why can't the Custom(GpuHashJoin) node build the hash table internally instead of using a separate node? Also, for this patch we are only considering custom scan. Custom join is another patch. We don't need to provide infrastructure for that patch in this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Regarding to the attached three patches: > > [1] custom-path and hook > > It adds register_custom_path_provider() interface for registration > > of custom-path entrypoint. Callbacks are invoked on > > set_plain_rel_pathlist to offer alternative scan path on regular > relations. > > I may need to explain the terms in use. I calls the path-node > > custom-path that is the previous step of population of plan-node > > (like custom-scan and potentially custom-join and so on). The node > > object created by > > CreateCustomPlan() is called custom-plan because it is abstraction > > for all the potential custom-xxx node; custom-scan is the first of all. > > I don't think it's a good thing that add_custom_path_type is declared > as void (*)(void *) rather than having a real type. I suggest we add > the path-creation callback function to CustomPlanMethods instead, like > this: > > void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel, > RangeTblEntry *rte); > > Then, register_custom_path_provider() can just take CustomPathMethods > * as an argument; and create_customscan_paths can just walk the list > of CustomPlanMethods objects and call CreateCustomScanPath for each > one where that is non-NULL. This conflates the path generation > mechanism with the type of path getting generated a little bit, but I > don't see any real downside to that. I don't see a reason why you'd > ever want two different providers to offer the same type of custompath. > It seems to me the design you suggested is smarter than the original one. The first patch was revised according to the design. > Don't the changes to src/backend/optimizer/plan/createplan.c belong in > patch #2? > The borderline between #1 and #2 is little bit bogus. So, I moved most of portion into #1, however, invocation of InitCustomScan (that is a callback in CustomPlanMethod) in create_custom_plan() is still in #2. > > [2] custom-scan node > > It adds custom-scan node support. The custom-scan node is expected > > to generate contents of a particular relation or sub-plan according > > to its custom-logic. > > Custom-scan provider needs to implement callbacks of > > CustomScanMethods and CustomExecMethods. Once a custom-scan node is > > populated from custom-path node, the backend calls back these > > methods in the planning and execution stage. > > It looks to me like this patch is full of holdovers from its earlier > life as a more-generic CustomPlan node. In particular, it contains > numerous defenses against the case where scanrelid != 0. These are > confusingly written as scanrelid > 0, but I think really they're just bogus altogether: > if this is specifically a CustomScan, not a CustomPlan, then the relid > should always be filled in. Please consider what can be simplified here. > OK, I revised. Now custom-scan assumes it has a particular valid relation to be scanned, so no code path with scanrelid == 0 at this moment. Let us revisit this scenario when custom-scan replaces relation-joins. In this case, custom-scan will not be associated with a particular base- relation, thus it needs to admit a custom-scan node with scanrelid == 0. > The comment in _copyCustomScan looks bogus to me. I think we should > *require* a static method table. > OK, it was fixed to copy the pointer of function table; not table itself. > In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of > stuff; } else elog(ERROR, ...). I think it would be clearer to write > if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff; > Fixed. > > Also, regarding to the use-case of multi-exec interface. > > Below is an EXPLAIN output of PG-Strom. It shows the custom > > GpuHashJoin has two sub-plans; GpuScan and MultiHash. > > GpuHashJoin is stacked on the GpuScan. It is a case when these nodes > > utilize multi-exec interface for more efficient data exchange > > between > the nodes. > > GpuScan already keeps a data structure that is suitable to send > > to/recv from GPU devices and constructed on the memory segment being > > DMA > available. > > If we have to form a tuple, pass it via row-by-row interface, then > > deform it, it will become a major performance degradation in this > > use > case. > > > > postgres=# explain select * from t10 natural join t8 natural join t9 > > where > x < 10; > > QUERY PLAN > > > ---------------------------------------------------------------------- > > ------------------------- Custom (GpuHashJoin) > > (cost=10979.56..90064.15 rows=333 width=49) > > pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, > > 2:<t8.data>, > 5:[t8.aid], 6:[t9.bid] > > hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid)) > > -> Custom (GpuScan) on t10 (cost=10000.00..88831.26 > > rows=3333327 > width=16) > > Host References: aid, bid, x > > Device References: x > > Device Filter: (x < 10::double precision) > > -> Custom (MultiHash) (cost=464.56..464.56 rows=1000 width=41) > > hash keys: aid, bid > > -> Hash Join (cost=60.06..464.56 rows=1000 width=41) > > Hash Cond: (t9.data = t8.data) > > -> Index Scan using t9_pkey on t9 > > (cost=0.29..357.29 > rows=10000 width=37) > > -> Hash (cost=47.27..47.27 rows=1000 width=37) > > -> Index Scan using t8_pkey on t8 > > (cost=0.28..47.27 rows=1000 width=37) Planning time: 0.810 ms > > (15 rows) > > Why can't the Custom(GpuHashJoin) node build the hash table internally > instead of using a separate node? > It's possible, however, it prevents to check sub-plans using EXPLAIN if we manage inner-plans internally. So, I'd like to have a separate node being connected to the inner-plan. > Also, for this patch we are only considering custom scan. Custom join > is another patch. We don't need to provide infrastructure for that > patch in this one. > OK, let me revisit it on the next stage, with functionalities above. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> Don't the changes to src/backend/optimizer/plan/createplan.c belong in >> patch #2? >> > The borderline between #1 and #2 is little bit bogus. So, I moved most of > portion into #1, however, invocation of InitCustomScan (that is a callback > in CustomPlanMethod) in create_custom_plan() is still in #2. Eh, create_custom_scan() certainly looks like it is in #1 from here, or at least part of it is. It calculates tlist and clauses and then does nothing with them. That clearly can't be the right division. I think it would make sense to have create_custom_scan() compute tlist and clauses first, and then pass those to CreateCustomPlan(). Then you don't need a separate InitCustomScan() - which is misnamed anyway, since it has nothing to do with ExecInitCustomScan(). > OK, I revised. Now custom-scan assumes it has a particular valid relation > to be scanned, so no code path with scanrelid == 0 at this moment. > > Let us revisit this scenario when custom-scan replaces relation-joins. > In this case, custom-scan will not be associated with a particular base- > relation, thus it needs to admit a custom-scan node with scanrelid == 0. Yeah, I guess the question there is whether we'll want let CustomScan have scanrelid == 0 or require that CustomJoin be used there instead. >> Why can't the Custom(GpuHashJoin) node build the hash table internally >> instead of using a separate node? >> > It's possible, however, it prevents to check sub-plans using EXPLAIN if we > manage inner-plans internally. So, I'd like to have a separate node being > connected to the inner-plan. Isn't that just a matter of letting the EXPLAIN code print more stuff?Why can't it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > wrote: > >> Don't the changes to src/backend/optimizer/plan/createplan.c belong > >> in patch #2? > >> > > The borderline between #1 and #2 is little bit bogus. So, I moved most > > of portion into #1, however, invocation of InitCustomScan (that is a > > callback in CustomPlanMethod) in create_custom_plan() is still in #2. > > Eh, create_custom_scan() certainly looks like it is in #1 from here, or > at least part of it is. It calculates tlist and clauses and then does > nothing with them. That clearly can't be the right division. > > I think it would make sense to have create_custom_scan() compute tlist and > clauses first, and then pass those to CreateCustomPlan(). Then you don't > need a separate InitCustomScan() - which is misnamed anyway, since it has > nothing to do with ExecInitCustomScan(). > The only reason why I put separate hooks here is, create_custom_scan() needs to know exact size of the CustomScan node (including private fields), however, it is helpful for extensions to kick its callback to initialize the node next to the common initialization stuff. If we have a static field to inform exact size of the data-type inherited from the CustomScan in CustomPathMethod, it may be able to eliminate the CreateCustomPlan(). One downside is, extension needs to register multiple CustomPath table for each custom-plan node to be populated later. So, my preference is the current design rather than static approach. Regarding to the naming, how about GetCustomScan() instead of InitCustomScan()? It follows the manner in create_foreignscan_plan(). > > OK, I revised. Now custom-scan assumes it has a particular valid > > relation to be scanned, so no code path with scanrelid == 0 at this moment. > > > > Let us revisit this scenario when custom-scan replaces relation-joins. > > In this case, custom-scan will not be associated with a particular > > base- relation, thus it needs to admit a custom-scan node with scanrelid > == 0. > > Yeah, I guess the question there is whether we'll want let CustomScan have > scanrelid == 0 or require that CustomJoin be used there instead. > Right now, I cannot imagine a use case that requires individual CustomJoin node because CustomScan with scanrelid==0 (that performs like custom-plan rather than custom-scan in actually) is sufficient. If a CustomScan gets chosen instead of built-in join logics, it shall looks like a relation scan on the virtual one that is consists of two underlying relation. Callbacks of the CustomScan has a responsibility to join underlying relations; that is invisible from the core executor. It seems to me CustomScan with scanrelid==0 is sufficient to implement an alternative logic on relation joins, don't need an individual node from the standpoint of executor. > >> Why can't the Custom(GpuHashJoin) node build the hash table > >> internally instead of using a separate node? > >> > > It's possible, however, it prevents to check sub-plans using EXPLAIN > > if we manage inner-plans internally. So, I'd like to have a separate > > node being connected to the inner-plan. > > Isn't that just a matter of letting the EXPLAIN code print more stuff? > Why can't it? > My GpuHashJoin takes multiple relations to load them a hash-table. On the other hand, Plan node can have two underlying relations at most (inner/outer). Outer-side is occupied by the larger relation, so it needs to make multiple relations visible using inner-branch. If CustomScan can has a list of multiple underlying plan-nodes, like Append node, it can represent the structure above in straightforward way, but I'm uncertain which is the better design. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> >> wrote: >> >> Don't the changes to src/backend/optimizer/plan/createplan.c belong >> >> in patch #2? >> >> >> > The borderline between #1 and #2 is little bit bogus. So, I moved most >> > of portion into #1, however, invocation of InitCustomScan (that is a >> > callback in CustomPlanMethod) in create_custom_plan() is still in #2. >> >> Eh, create_custom_scan() certainly looks like it is in #1 from here, or >> at least part of it is. It calculates tlist and clauses and then does >> nothing with them. That clearly can't be the right division. >> >> I think it would make sense to have create_custom_scan() compute tlist and >> clauses first, and then pass those to CreateCustomPlan(). Then you don't >> need a separate InitCustomScan() - which is misnamed anyway, since it has >> nothing to do with ExecInitCustomScan(). >> > The only reason why I put separate hooks here is, create_custom_scan() needs > to know exact size of the CustomScan node (including private fields), however, > it is helpful for extensions to kick its callback to initialize the node > next to the common initialization stuff. Why does it need to know that? I don't see that it's doing anything that requires knowing the size of that node, and if it is, I think it shouldn't be. That should get delegated to the callback provided by the custom plan provider. > Regarding to the naming, how about GetCustomScan() instead of InitCustomScan()? > It follows the manner in create_foreignscan_plan(). I guess that's a bit better, but come to think of it, I'd really like to avoid baking in the assumption that the custom path provider has to return any particular type of plan node. A good start would be to give it a name that doesn't imply that - e.g. PlanCustomPath(). >> > OK, I revised. Now custom-scan assumes it has a particular valid >> > relation to be scanned, so no code path with scanrelid == 0 at this moment. >> > >> > Let us revisit this scenario when custom-scan replaces relation-joins. >> > In this case, custom-scan will not be associated with a particular >> > base- relation, thus it needs to admit a custom-scan node with scanrelid >> == 0. >> >> Yeah, I guess the question there is whether we'll want let CustomScan have >> scanrelid == 0 or require that CustomJoin be used there instead. >> > Right now, I cannot imagine a use case that requires individual CustomJoin > node because CustomScan with scanrelid==0 (that performs like custom-plan > rather than custom-scan in actually) is sufficient. > > If a CustomScan gets chosen instead of built-in join logics, it shall looks > like a relation scan on the virtual one that is consists of two underlying > relation. Callbacks of the CustomScan has a responsibility to join underlying > relations; that is invisible from the core executor. > > It seems to me CustomScan with scanrelid==0 is sufficient to implement > an alternative logic on relation joins, don't need an individual node > from the standpoint of executor. That's valid logic, but it's not the only way to do it. If we have CustomScan and CustomJoin, either of them will require some adaption to handle this case. We can either allow a custom scan that isn't scanning any particular relation (i.e. scanrelid == 0), or we can allow a custom join that has no children. I don't know which way will come out cleaner, and I think it's good to leave that decision to one side for now. >> >> Why can't the Custom(GpuHashJoin) node build the hash table >> >> internally instead of using a separate node? >> >> >> > It's possible, however, it prevents to check sub-plans using EXPLAIN >> > if we manage inner-plans internally. So, I'd like to have a separate >> > node being connected to the inner-plan. >> >> Isn't that just a matter of letting the EXPLAIN code print more stuff? >> Why can't it? >> > My GpuHashJoin takes multiple relations to load them a hash-table. > On the other hand, Plan node can have two underlying relations at most > (inner/outer). Outer-side is occupied by the larger relation, so it > needs to make multiple relations visible using inner-branch. > If CustomScan can has a list of multiple underlying plan-nodes, like > Append node, it can represent the structure above in straightforward > way, but I'm uncertain which is the better design. Right. I think the key point is that it is *possible* to make this work without a multiexec interface, and it seems like we're agreed that it is. Now perhaps we will decide that there is enough benefit in having multiexec support that we want to do it anyway, but it's clearly not a hard requirement, because it can be done without that in the way you describe here. Let's leave to the future the decision as to how to proceed here; getting the basic thing done is hard enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai > >> <kaigai@ak.jp.nec.com> > >> wrote: > >> >> Don't the changes to src/backend/optimizer/plan/createplan.c > >> >> belong in patch #2? > >> >> > >> > The borderline between #1 and #2 is little bit bogus. So, I moved > >> > most of portion into #1, however, invocation of InitCustomScan > >> > (that is a callback in CustomPlanMethod) in create_custom_plan() is > still in #2. > >> > >> Eh, create_custom_scan() certainly looks like it is in #1 from here, > >> or at least part of it is. It calculates tlist and clauses and then > >> does nothing with them. That clearly can't be the right division. > >> > >> I think it would make sense to have create_custom_scan() compute > >> tlist and clauses first, and then pass those to CreateCustomPlan(). > >> Then you don't need a separate InitCustomScan() - which is misnamed > >> anyway, since it has nothing to do with ExecInitCustomScan(). > >> > > The only reason why I put separate hooks here is, create_custom_scan() > > needs to know exact size of the CustomScan node (including private > > fields), however, it is helpful for extensions to kick its callback to > > initialize the node next to the common initialization stuff. > > Why does it need to know that? I don't see that it's doing anything that > requires knowing the size of that node, and if it is, I think it shouldn't > be. That should get delegated to the callback provided by the custom plan > provider. > Sorry, my explanation might be confusable. The create_custom_scan() does not need to know the exact size of the CustomScan (or its inheritance) because of the two separated hooks; one is node allocation time, the other is the tail of the series of initialization. If we have only one hook here, we need to have a mechanism to informs create_custom_scan() an exact size of the CustomScan node; including private fields managed by the provider, instead of the first hook on node allocation time. In this case, node allocation shall be processed by create_custom_scan() and it has to know exact size of the node to be allocated. How do I implement the feature here? Is the combination of static node size and callback on the tail more simple than the existing design that takes two individual hooks on create_custom_scan()? > > Regarding to the naming, how about GetCustomScan() instead of > InitCustomScan()? > > It follows the manner in create_foreignscan_plan(). > > I guess that's a bit better, but come to think of it, I'd really like to > avoid baking in the assumption that the custom path provider has to return > any particular type of plan node. A good start would be to give it a name > that doesn't imply that - e.g. PlanCustomPath(). > OK, I'll use this naming. > >> > OK, I revised. Now custom-scan assumes it has a particular valid > >> > relation to be scanned, so no code path with scanrelid == 0 at this > moment. > >> > > >> > Let us revisit this scenario when custom-scan replaces relation-joins. > >> > In this case, custom-scan will not be associated with a particular > >> > base- relation, thus it needs to admit a custom-scan node with > >> > scanrelid > >> == 0. > >> > >> Yeah, I guess the question there is whether we'll want let CustomScan > >> have scanrelid == 0 or require that CustomJoin be used there instead. > >> > > Right now, I cannot imagine a use case that requires individual > > CustomJoin node because CustomScan with scanrelid==0 (that performs > > like custom-plan rather than custom-scan in actually) is sufficient. > > > > If a CustomScan gets chosen instead of built-in join logics, it shall > > looks like a relation scan on the virtual one that is consists of two > > underlying relation. Callbacks of the CustomScan has a responsibility > > to join underlying relations; that is invisible from the core executor. > > > > It seems to me CustomScan with scanrelid==0 is sufficient to implement > > an alternative logic on relation joins, don't need an individual node > > from the standpoint of executor. > > That's valid logic, but it's not the only way to do it. If we have CustomScan > and CustomJoin, either of them will require some adaption to handle this > case. We can either allow a custom scan that isn't scanning any particular > relation (i.e. scanrelid == 0), or we can allow a custom join that has no > children. I don't know which way will come out cleaner, and I think it's > good to leave that decision to one side for now. > Yep. I agree with you. It may not be productive discussion to conclude this design topic right now. Let's assume CustomScan scans on a particular relation (scanrelid != 0) on the first revision. > >> >> Why can't the Custom(GpuHashJoin) node build the hash table > >> >> internally instead of using a separate node? > >> >> > >> > It's possible, however, it prevents to check sub-plans using > >> > EXPLAIN if we manage inner-plans internally. So, I'd like to have a > >> > separate node being connected to the inner-plan. > >> > >> Isn't that just a matter of letting the EXPLAIN code print more stuff? > >> Why can't it? > >> > > My GpuHashJoin takes multiple relations to load them a hash-table. > > On the other hand, Plan node can have two underlying relations at most > > (inner/outer). Outer-side is occupied by the larger relation, so it > > needs to make multiple relations visible using inner-branch. > > If CustomScan can has a list of multiple underlying plan-nodes, like > > Append node, it can represent the structure above in straightforward > > way, but I'm uncertain which is the better design. > > Right. I think the key point is that it is *possible* to make this work > without a multiexec interface, and it seems like we're agreed that it is. > Now perhaps we will decide that there is enough benefit in having multiexec > support that we want to do it anyway, but it's clearly not a hard requirement, > because it can be done without that in the way you describe here. Let's > leave to the future the decision as to how to proceed here; getting the > basic thing done is hard enough. > OK, let's postpone the discussion on the custom-join support. Either of approaches (1. multi-exec support, or 2. multiple subplans like Append) is sufficient for this purpose, and the multi-exec interface is a way to implement it, not a goal. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Sep 15, 2014 at 8:38 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> > The only reason why I put separate hooks here is, create_custom_scan() >> > needs to know exact size of the CustomScan node (including private >> > fields), however, it is helpful for extensions to kick its callback to >> > initialize the node next to the common initialization stuff. >> >> Why does it need to know that? I don't see that it's doing anything that >> requires knowing the size of that node, and if it is, I think it shouldn't >> be. That should get delegated to the callback provided by the custom plan >> provider. >> > Sorry, my explanation might be confusable. The create_custom_scan() does not > need to know the exact size of the CustomScan (or its inheritance) because of > the two separated hooks; one is node allocation time, the other is the tail > of the series of initialization. > If we have only one hook here, we need to have a mechanism to informs > create_custom_scan() an exact size of the CustomScan node; including private > fields managed by the provider, instead of the first hook on node allocation > time. In this case, node allocation shall be processed by create_custom_scan() > and it has to know exact size of the node to be allocated. > > How do I implement the feature here? Is the combination of static node size > and callback on the tail more simple than the existing design that takes two > individual hooks on create_custom_scan()? I still don't get it. Right now, the logic in create_custom_scan(), which I think should really be create_custom_plan() or create_plan_from_custom_path(), basically looks like this: 1. call hook function CreateCustomPlan 2. compute values for tlist and clauses 3. pass those values to hook function InitCustomScan() 4. call copy_path_costsize What I think we should do is: 1. compute values for tlist and clauses 2. pass those values to hook function PlanCustomPath(), which will return a Plan 3. call copy_path_costsize create_custom_scan() does not need to allocate the node. You don't need the node to be allocated before computing tlist and clauses. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Sep 15, 2014 at 8:38 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> > The only reason why I put separate hooks here is, > >> > create_custom_scan() needs to know exact size of the CustomScan > >> > node (including private fields), however, it is helpful for > >> > extensions to kick its callback to initialize the node next to the > common initialization stuff. > >> > >> Why does it need to know that? I don't see that it's doing anything > >> that requires knowing the size of that node, and if it is, I think it > >> shouldn't be. That should get delegated to the callback provided by > >> the custom plan provider. > >> > > Sorry, my explanation might be confusable. The create_custom_scan() > > does not need to know the exact size of the CustomScan (or its > > inheritance) because of the two separated hooks; one is node > > allocation time, the other is the tail of the series of initialization. > > If we have only one hook here, we need to have a mechanism to informs > > create_custom_scan() an exact size of the CustomScan node; including > > private fields managed by the provider, instead of the first hook on > > node allocation time. In this case, node allocation shall be processed > > by create_custom_scan() and it has to know exact size of the node to be > allocated. > > > > How do I implement the feature here? Is the combination of static node > > size and callback on the tail more simple than the existing design > > that takes two individual hooks on create_custom_scan()? > > I still don't get it. Right now, the logic in create_custom_scan(), which > I think should really be create_custom_plan() or > create_plan_from_custom_path(), basically looks like this: > > 1. call hook function CreateCustomPlan > 2. compute values for tlist and clauses > 3. pass those values to hook function InitCustomScan() 4. call > copy_path_costsize > > What I think we should do is: > > 1. compute values for tlist and clauses > 2. pass those values to hook function PlanCustomPath(), which will return > a Plan 3. call copy_path_costsize > > create_custom_scan() does not need to allocate the node. You don't need > the node to be allocated before computing tlist and clauses. > Thanks, I could get the point. I'll revise the patch according to the suggestion above. It seems to me, we can also apply similar manner on ExecInitCustomScan(). The current implementation doing is: 1. call CreateCustomScanState() to allocate a CustomScanState node 2. common initialization of the fields on CustomScanState, but not private fields. 3. call BeginCustomScan() to initialize remaining stuffs and begin execution. If BeginCustomScan() is re-defined to accept values for common initialization portions and to return a CustomScanState node, we may be able to eliminate the CreateCustomScanState() hook. Unlike create_custom_scan() case, it takes more number of values for common initialization portions; expression tree of tlist and quals, scan and result tuple-slot, projection info and relation handler. It may mess up the interface specification. In addition, BeginCustomScan() has to belong to CustomScanMethods, not CustomexecMethods. I'm uncertain whether it is straightforward location. (a whisper: It may not need to be separate tables. CustomScan always populates CustomScanState, unlike relationship between CustomPath and CustomScan.) How about your opinion to apply the above manner on ExecInitCustomScan() also? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> > >> Why does it need to know that? I don't see that it's doing > > >> anything that requires knowing the size of that node, and if it is, > > >> I think it shouldn't be. That should get delegated to the callback > > >> provided by the custom plan provider. > > >> > > > Sorry, my explanation might be confusable. The create_custom_scan() > > > does not need to know the exact size of the CustomScan (or its > > > inheritance) because of the two separated hooks; one is node > > > allocation time, the other is the tail of the series of initialization. > > > If we have only one hook here, we need to have a mechanism to > > > informs > > > create_custom_scan() an exact size of the CustomScan node; including > > > private fields managed by the provider, instead of the first hook on > > > node allocation time. In this case, node allocation shall be > > > processed by create_custom_scan() and it has to know exact size of > > > the node to be > > allocated. > > > > > > How do I implement the feature here? Is the combination of static > > > node size and callback on the tail more simple than the existing > > > design that takes two individual hooks on create_custom_scan()? > > > > I still don't get it. Right now, the logic in create_custom_scan(), > > which I think should really be create_custom_plan() or > > create_plan_from_custom_path(), basically looks like this: > > > > 1. call hook function CreateCustomPlan 2. compute values for tlist and > > clauses 3. pass those values to hook function InitCustomScan() 4. call > > copy_path_costsize > > > > What I think we should do is: > > > > 1. compute values for tlist and clauses 2. pass those values to hook > > function PlanCustomPath(), which will return a Plan 3. call > > copy_path_costsize > > > > create_custom_scan() does not need to allocate the node. You don't > > need the node to be allocated before computing tlist and clauses. > > > Thanks, I could get the point. > I'll revise the patch according to the suggestion above. > At this moment, I revised the above portion of the patches. create_custom_plan() was modified to call "PlanCustomPath" callback next to the initialization of tlist and clauses. It's probably same as what you suggested. > It seems to me, we can also apply similar manner on ExecInitCustomScan(). > The current implementation doing is: > 1. call CreateCustomScanState() to allocate a CustomScanState node 2. > common initialization of the fields on CustomScanState, but not private > fields. > 3. call BeginCustomScan() to initialize remaining stuffs and begin > execution. > > If BeginCustomScan() is re-defined to accept values for common > initialization portions and to return a CustomScanState node, we may be > able to eliminate the CreateCustomScanState() hook. > > Unlike create_custom_scan() case, it takes more number of values for common > initialization portions; expression tree of tlist and quals, scan and result > tuple-slot, projection info and relation handler. It may mess up the > interface specification. > In addition, BeginCustomScan() has to belong to CustomScanMethods, not > CustomexecMethods. I'm uncertain whether it is straightforward location. > (a whisper: It may not need to be separate tables. CustomScan always > populates CustomScanState, unlike relationship between CustomPath and > CustomScan.) > > How about your opinion to apply the above manner on ExecInitCustomScan() > also? > I kept existing implementation around ExecInitCustomScan() right now. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > At this moment, I revised the above portion of the patches. > create_custom_plan() was modified to call "PlanCustomPath" callback > next to the initialization of tlist and clauses. > > It's probably same as what you suggested. create_custom_plan() is mis-named. It's actually only applicable to the custom-scan case, because it's triggered by create_plan_recurse() getting a path node with a T_CustomScan pathtype. Now, we could change that; although in general create_plan_recurse() dispatches on pathtype, we could make CustomPath an exception; the top of that function could say if (IsA(best_path, CustomPath)) { /* do custom stuff */ }. But the problem with that idea is that, when the custom path is specifically a custom scan, rather than a join or some other thing, you want to do all of the same processing that's in create_scan_plan(). So I think what should happen is that create_plan_recurse() should handle T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et al: by calling create_scan_plan(). The switch inside that function can then call a function create_customscan_plan() if it sees T_CustomScan. And that function will be simpler than the create_custom_plan() that you have now, and it will be named correctly, too. In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom". And further down, the custom_name should be printed as "Custom Plan Provider" not just "Custom". setrefs.c has remaining handling for the scanrelid = 0 case; please remove that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > At this moment, I revised the above portion of the patches. > > create_custom_plan() was modified to call "PlanCustomPath" callback > > next to the initialization of tlist and clauses. > > > > It's probably same as what you suggested. > > create_custom_plan() is mis-named. It's actually only applicable to the > custom-scan case, because it's triggered by create_plan_recurse() getting > a path node with a T_CustomScan pathtype. Now, we could change that; > although in general create_plan_recurse() dispatches on pathtype, we could > make CustomPath an exception; the top of that function could say if > (IsA(best_path, CustomPath)) { /* do custom stuff */ }. But the problem > with that idea is that, when the custom path is specifically a custom scan, > rather than a join or some other thing, you want to do all of the same > processing that's in create_scan_plan(). > > So I think what should happen is that create_plan_recurse() should handle > T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et > al: by calling create_scan_plan(). The switch inside that function can > then call a function create_customscan_plan() if it sees T_CustomScan. And > that function will be simpler than the > create_custom_plan() that you have now, and it will be named correctly, > too. > Fixed, according to what you suggested. It seems to me create_customscan_plan() became more simplified than before. Probably, it will minimize the portion of special case handling if CustomScan with scanrelid==0 replaces built-in join plan in the future version. > In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom". > And further down, the custom_name should be printed as "Custom Plan > Provider" not just "Custom". > Fixed. I added an additional regression test to check EXPLAIN output if not a text format. > setrefs.c has remaining handling for the scanrelid = 0 case; please remove > that. > Sorry, I removed it, and checked the patch again to ensure here is no similar portions. Thanks for your reviewing. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On 29 September 2014 09:48, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> > At this moment, I revised the above portion of the patches. >> > create_custom_plan() was modified to call "PlanCustomPath" callback >> > next to the initialization of tlist and clauses. >> > >> > It's probably same as what you suggested. >> >> create_custom_plan() is mis-named. It's actually only applicable to the >> custom-scan case, because it's triggered by create_plan_recurse() getting >> a path node with a T_CustomScan pathtype. Now, we could change that; >> although in general create_plan_recurse() dispatches on pathtype, we could >> make CustomPath an exception; the top of that function could say if >> (IsA(best_path, CustomPath)) { /* do custom stuff */ }. But the problem >> with that idea is that, when the custom path is specifically a custom scan, >> rather than a join or some other thing, you want to do all of the same >> processing that's in create_scan_plan(). >> >> So I think what should happen is that create_plan_recurse() should handle >> T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et >> al: by calling create_scan_plan(). The switch inside that function can >> then call a function create_customscan_plan() if it sees T_CustomScan. And >> that function will be simpler than the >> create_custom_plan() that you have now, and it will be named correctly, >> too. >> > Fixed, according to what you suggested. It seems to me create_customscan_plan() > became more simplified than before. > Probably, it will minimize the portion of special case handling if CustomScan > with scanrelid==0 replaces built-in join plan in the future version. > >> In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom". >> And further down, the custom_name should be printed as "Custom Plan >> Provider" not just "Custom". >> > Fixed. I added an additional regression test to check EXPLAIN output > if not a text format. > >> setrefs.c has remaining handling for the scanrelid = 0 case; please remove >> that. >> > Sorry, I removed it, and checked the patch again to ensure here is no similar > portions. > > Thanks for your reviewing. pgsql-v9.5-custom-scan.part-2.v11.patch +GetSpecialCustomVar(CustomPlanState *node, + Var *varnode, + PlanState **child_ps); This doesn't seem to strictly match the actual function: +GetSpecialCustomVar(PlanState *ps, Var *varnode, PlanState **child_ps) -- Thom
2014-09-29 20:26 GMT+09:00 Thom Brown <thom@linux.com>: > On 29 September 2014 09:48, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> On Wed, Sep 17, 2014 at 7:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> > At this moment, I revised the above portion of the patches. >>> > create_custom_plan() was modified to call "PlanCustomPath" callback >>> > next to the initialization of tlist and clauses. >>> > >>> > It's probably same as what you suggested. >>> >>> create_custom_plan() is mis-named. It's actually only applicable to the >>> custom-scan case, because it's triggered by create_plan_recurse() getting >>> a path node with a T_CustomScan pathtype. Now, we could change that; >>> although in general create_plan_recurse() dispatches on pathtype, we could >>> make CustomPath an exception; the top of that function could say if >>> (IsA(best_path, CustomPath)) { /* do custom stuff */ }. But the problem >>> with that idea is that, when the custom path is specifically a custom scan, >>> rather than a join or some other thing, you want to do all of the same >>> processing that's in create_scan_plan(). >>> >>> So I think what should happen is that create_plan_recurse() should handle >>> T_CustomScan the same way it handles T_SeqScan, T_IndexScan, et >>> al: by calling create_scan_plan(). The switch inside that function can >>> then call a function create_customscan_plan() if it sees T_CustomScan. And >>> that function will be simpler than the >>> create_custom_plan() that you have now, and it will be named correctly, >>> too. >>> >> Fixed, according to what you suggested. It seems to me create_customscan_plan() >> became more simplified than before. >> Probably, it will minimize the portion of special case handling if CustomScan >> with scanrelid==0 replaces built-in join plan in the future version. >> >>> In ExplainNode(), I think sname should be set to "Custom Scan", not "Custom". >>> And further down, the custom_name should be printed as "Custom Plan >>> Provider" not just "Custom". >>> >> Fixed. I added an additional regression test to check EXPLAIN output >> if not a text format. >> >>> setrefs.c has remaining handling for the scanrelid = 0 case; please remove >>> that. >>> >> Sorry, I removed it, and checked the patch again to ensure here is no similar >> portions. >> >> Thanks for your reviewing. > > pgsql-v9.5-custom-scan.part-2.v11.patch > > +GetSpecialCustomVar(CustomPlanState *node, > + Var *varnode, > + PlanState **child_ps); > > This doesn't seem to strictly match the actual function: > > +GetSpecialCustomVar(PlanState *ps, Var *varnode, PlanState **child_ps) > It's more convenient if the first argument is PlanState, because GetSpecialCustomVar() is called towards all the suspicious special var-node that might be managed by custom-plan provider. If we have to ensure its first argument is CustomPlanState on the caller side, it makes function's invocation more complicated. Also, the callback portion is called only when PlanState is CustomPlanState, so it is natural to take CustomPlanState for argument of the callback interface. Do we need to match the prototype of wrapper function with callback? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > Do we need to match the prototype of wrapper function with callback? Yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > Do we need to match the prototype of wrapper function with callback? > > Yes. > OK, I fixed up the patch part-2, to fit declaration of GetSpecialCustomVar() with corresponding callback. Also, a noise in the part-3 patch, by git-pull, was removed. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Tue, Jul 8, 2014 at 6:55 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > * Syntax also reflects what the command does more. New syntax to > define custom plan provider is: > CREATE CUSTOM PLAN PROVIDER <cpp_name> > FOR <cpp_class> HANDLER <cpp_function>; -1 on 'cpp' prefix. I don't see acronyms used in the syntax documentation and cpp will make people reflexively think 'c++'. How about <provider_name> and <provider_function>? merlin
2014-10-02 0:41 GMT+09:00 Merlin Moncure <mmoncure@gmail.com>: > On Tue, Jul 8, 2014 at 6:55 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> * Syntax also reflects what the command does more. New syntax to >> define custom plan provider is: >> CREATE CUSTOM PLAN PROVIDER <cpp_name> >> FOR <cpp_class> HANDLER <cpp_function>; > > -1 on 'cpp' prefix. I don't see acronyms used in the syntax > documentation and cpp will make people reflexively think 'c++'. How > about <provider_name> and <provider_function>? > It is not a living code. I already eliminated the SQL syntax portion, instead of the internal interface (register_custom_path_provider) that registers callbacks on extension load time. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On 30 September 2014 07:27, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
> > Do we need to match the prototype of wrapper function with callback?
>
> Yes.
>
OK, I fixed up the patch part-2, to fit declaration of GetSpecialCustomVar()
with corresponding callback.
Also, a noise in the part-3 patch, by git-pull, was removed.
FYI, patch v12 part 2 no longer applies cleanly.
Thom
> FYI, patch v12 part 2 no longer applies cleanly. > Thanks. I rebased the patch set according to the latest master branch. The attached v13 can be applied to the master. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: thombrown@gmail.com [mailto:thombrown@gmail.com] On Behalf Of Thom > Brown > Sent: Sunday, October 26, 2014 9:22 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru Hanada; > Simon Riggs; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter > Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On 30 September 2014 07:27, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > > On Mon, Sep 29, 2014 at 9:04 AM, Kohei KaiGai > <kaigai@kaigai.gr.jp> wrote: > > > Do we need to match the prototype of wrapper function with > callback? > > > > Yes. > > > OK, I fixed up the patch part-2, to fit declaration of > GetSpecialCustomVar() > with corresponding callback. > > Also, a noise in the part-3 patch, by git-pull, was removed. > > > FYI, patch v12 part 2 no longer applies cleanly. > > -- > Thom
Attachment
On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> FYI, patch v12 part 2 no longer applies cleanly. >> > Thanks. I rebased the patch set according to the latest master branch. > The attached v13 can be applied to the master. I've committed parts 1 and 2 of this, without the documentation, and with some additional cleanup. I am not sure that this feature is sufficiently non-experimental that it deserves to be documented, but if we're thinking of doing that then the documentation needs a lot more work. I think part 3 of the patch is mostly useful as a demonstration of how this API can be used, and is not something we probably want to commit. So I'm not planning, at this point, to spend any more time on this patch series, and will mark it Committed in the CF app. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> FYI, patch v12 part 2 no longer applies cleanly. > >> > > Thanks. I rebased the patch set according to the latest master branch. > > The attached v13 can be applied to the master. > > I've committed parts 1 and 2 of this, without the documentation, and with > some additional cleanup. I am not sure that this feature is sufficiently > non-experimental that it deserves to be documented, but if we're thinking > of doing that then the documentation needs a lot more work. I think part > 3 of the patch is mostly useful as a demonstration of how this API can be > used, and is not something we probably want to commit. So I'm not planning, > at this point, to spend any more time on this patch series, and will mark > it Committed in the CF app. > Thanks for your great help. I and Hanada-san have discussed a further enhancement of that interface that allows to replace a join by custom-scan; probably, can be utilized with an extension that runs materialized-view instead of join on the fly. We will submit a design proposal of this enhancement later. Best regards, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Sat, Nov 8, 2014 at 4:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> FYI, patch v12 part 2 no longer applies cleanly.
> >>
> > Thanks. I rebased the patch set according to the latest master branch.
> > The attached v13 can be applied to the master.
>
> I've committed parts 1 and 2 of this, without the documentation, and
> with some additional cleanup.
>
> On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> FYI, patch v12 part 2 no longer applies cleanly.
> >>
> > Thanks. I rebased the patch set according to the latest master branch.
> > The attached v13 can be applied to the master.
>
> I've committed parts 1 and 2 of this, without the documentation, and
> with some additional cleanup.
Few observations/questions related to this commit:
1.
@@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel, deparse_context *context)
colinfo = deparse_columns_fetch(var->varno, dpns);
attnum = var->varattno;
}
+ else if (IS_SPECIAL_VARNO(var->varno) &&
+ IsA(dpns->planstate, CustomScanState) &&
+ (expr = GetSpecialCustomVar((CustomScanState *) dpns->planstate,
+ var, &child_ps)) != NULL)
+ {
+ deparse_namespace save_dpns;
+
+ if (child_ps)
+ push_child_plan(dpns, child_ps, &save_dpns);
+ /*
+ * Force parentheses because our caller probably assumed a Var is a
+ * simple expression.
+ */
+ if (!IsA(expr, Var))
+ appendStringInfoChar(buf, '(');
+ get_rule_expr((Node *) expr, context, true);
+ if (!IsA(expr, Var))
+ appendStringInfoChar(buf, ')');
+
+ if (child_ps)
+ pop_child_plan(dpns, &save_dpns);
+ return NULL;
+ }
a. It seems Assert for netlelvelsup is missing in this loop.
b. Below comment in function get_variable can be improved
w.r.t handling for CustomScanState. The comment indicates
that if varno is OUTER_VAR or INNER_VAR or INDEX_VAR, it handles
all of them similarly which seems to be slightly changed for
CustomScanState.
/*
* Try to find the relevant RTE in this rtable. In a plan tree, it's
* likely that varno is
OUTER_VAR or INNER_VAR, in which case we must dig
* down into the subplans, or INDEX_VAR, which is
resolved similarly. Also
* find the aliases previously assigned for this RTE.
*/
2.
+void
+register_custom_path_provider(CustomPathMethods *cpp_methods)
{
..
}
Shouldn't there be unregister function corresponding to above
register function?
> On Sat, Nov 8, 2014 at 4:16 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > wrote: > > >> FYI, patch v12 part 2 no longer applies cleanly. > > >> > > > Thanks. I rebased the patch set according to the latest master branch. > > > The attached v13 can be applied to the master. > > > > I've committed parts 1 and 2 of this, without the documentation, and > > with some additional cleanup. > > Few observations/questions related to this commit: > > 1. > @@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel, > deparse_context *context) > colinfo = deparse_columns_fetch(var->varno, dpns); > attnum = var->varattno; > } > + else if (IS_SPECIAL_VARNO(var->varno) && IsA(dpns->planstate, > + CustomScanState) && (expr = GetSpecialCustomVar((CustomScanState *) > + dpns->planstate, var, &child_ps)) != NULL) { deparse_namespace > + save_dpns; > + > + if (child_ps) > + push_child_plan(dpns, child_ps, &save_dpns); > + /* > + * Force parentheses because our caller probably assumed a Var is a > + * simple expression. > + */ > + if (!IsA(expr, Var)) > + appendStringInfoChar(buf, '('); > + get_rule_expr((Node *) expr, context, true); if (!IsA(expr, Var)) > + appendStringInfoChar(buf, ')'); > + > + if (child_ps) > + pop_child_plan(dpns, &save_dpns); > + return NULL; > + } > > a. It seems Assert for netlelvelsup is missing in this loop. > Indeed, this if-block does not have assertion unlike other special-varno. > b. Below comment in function get_variable can be improved w.r.t handling > for CustomScanState. The comment indicates that if varno is OUTER_VAR or > INNER_VAR or INDEX_VAR, it handles all of them similarly which seems to > be slightly changed for CustomScanState. > > /* > * Try to find the relevant RTE in this rtable. In a plan tree, it's > * likely that varno is > OUTER_VAR or INNER_VAR, in which case we must dig > * down into the subplans, or INDEX_VAR, which is resolved similarly. Also > * find the aliases previously assigned for this RTE. > */ > I made a small comment that introduces only extension knows the mapping between these special varno and underlying expression, thus, it queries providers the expression being tied with this special varnode. Does it make sense? > 2. > +void > +register_custom_path_provider(CustomPathMethods *cpp_methods) > { > .. > } > > Shouldn't there be unregister function corresponding to above register > function? > Even though it is not difficult to implement, what situation will make sense to unregister rather than enable_xxxx_scan GUC parameter added by extension itself? I initially thought prepared statement with custom-scan node is problematic if provider got unregistered / unloaded, however, internal_unload_library() actually does nothing. So, it is at least harmless even if we implemented. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Mon, Nov 10, 2014 at 4:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >
> > Few observations/questions related to this commit:
> >
> > 1.
> > @@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel,
> > deparse_context *context)
> > colinfo = deparse_columns_fetch(var->varno, dpns);
> > attnum = var->varattno;
> > }
> > + else if (IS_SPECIAL_VARNO(var->varno) && IsA(dpns->planstate,
> > + CustomScanState) && (expr = GetSpecialCustomVar((CustomScanState *)
> > + dpns->planstate, var, &child_ps)) != NULL) { deparse_namespace
> > + save_dpns;
> > +
> > + if (child_ps)
> > + push_child_plan(dpns, child_ps, &save_dpns);
> > + /*
> > + * Force parentheses because our caller probably assumed a Var is a
> > + * simple expression.
> > + */
> > + if (!IsA(expr, Var))
> > + appendStringInfoChar(buf, '(');
> > + get_rule_expr((Node *) expr, context, true); if (!IsA(expr, Var))
> > + appendStringInfoChar(buf, ')');
> > +
> > + if (child_ps)
> > + pop_child_plan(dpns, &save_dpns);
> > + return NULL;
> > + }
> >
> > a. It seems Assert for netlelvelsup is missing in this loop.
> >
> Indeed, this if-block does not have assertion unlike other special-varno.
>
Similar handling is required in function get_name_for_var_field().
>
> > 2.
> > +void
> > +register_custom_path_provider(CustomPathMethods *cpp_methods)
> > {
> > ..
> > }
> >
> > Shouldn't there be unregister function corresponding to above register
> > function?
> >
> Even though it is not difficult to implement, what situation will make
> sense to unregister rather than enable_xxxx_scan GUC parameter added by
> extension itself?
> >
> > Few observations/questions related to this commit:
> >
> > 1.
> > @@ -5546,6 +5568,29 @@ get_variable(Var *var, int levelsup, bool istoplevel,
> > deparse_context *context)
> > colinfo = deparse_columns_fetch(var->varno, dpns);
> > attnum = var->varattno;
> > }
> > + else if (IS_SPECIAL_VARNO(var->varno) && IsA(dpns->planstate,
> > + CustomScanState) && (expr = GetSpecialCustomVar((CustomScanState *)
> > + dpns->planstate, var, &child_ps)) != NULL) { deparse_namespace
> > + save_dpns;
> > +
> > + if (child_ps)
> > + push_child_plan(dpns, child_ps, &save_dpns);
> > + /*
> > + * Force parentheses because our caller probably assumed a Var is a
> > + * simple expression.
> > + */
> > + if (!IsA(expr, Var))
> > + appendStringInfoChar(buf, '(');
> > + get_rule_expr((Node *) expr, context, true); if (!IsA(expr, Var))
> > + appendStringInfoChar(buf, ')');
> > +
> > + if (child_ps)
> > + pop_child_plan(dpns, &save_dpns);
> > + return NULL;
> > + }
> >
> > a. It seems Assert for netlelvelsup is missing in this loop.
> >
> Indeed, this if-block does not have assertion unlike other special-varno.
>
Similar handling is required in function get_name_for_var_field().
Another point which I wanted to clarify is that in function
get_name_for_var_field(), for all other cases except the new
case added for CustomScanState, it calls get_name_for_var_field()
recursively to get the name of field whereas for CustomScanState,
it calls get_rule_expr() which doesn't look to be problematic in general,
but still it is better to get the name as other cases does unless there
is a special need for CustomScanState?
>
> > 2.
> > +void
> > +register_custom_path_provider(CustomPathMethods *cpp_methods)
> > {
> > ..
> > }
> >
> > Shouldn't there be unregister function corresponding to above register
> > function?
> >
> Even though it is not difficult to implement, what situation will make
> sense to unregister rather than enable_xxxx_scan GUC parameter added by
> extension itself?
I thought that in general if user has the API to register the custom path
methods, it should have some way to unregister them and also user might
need to register some different custom path methods after unregistering
the previous one's. I think we should see what Robert or others have to
say about this point before trying to provide such an API.
On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I thought that in general if user has the API to register the custom path > methods, it should have some way to unregister them and also user might > need to register some different custom path methods after unregistering > the previous one's. I think we should see what Robert or others have to > say about this point before trying to provide such an API. I wouldn't bother. As KaiGai says, if you want to shut the functionality off, the provider itself can provide a GUC. Also, we really have made no effort to ensure that loadable modules can be safely unloaded, or hooked functions safely-unhooked. ExecutorRun_hook is a good example. Typical of hook installation is this: prev_ExecutorRun = ExecutorRun_hook; ExecutorRun_hook = pgss_ExecutorRun; Well, if multiple extensions use this hook, then there's no hope of unloading them exception in reverse order of installation. We essentially end up creating a singly-linked list of hook users, but with the next-pointers stored in arbitrarily-named, likely-static variables owned by the individual extensions, so that nobody can actually traverse it. This might be worth fixing as part of a concerted campaign to make UNLOAD work, but unless somebody's really going to do that I see little reason to hold this to a higher standard than we apply elsewhere. The ability to remove extensions from this hook won't be valuable by itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 10, 2014 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I thought that in general if user has the API to register the custom path
> > methods, it should have some way to unregister them and also user might
> > need to register some different custom path methods after unregistering
> > the previous one's. I think we should see what Robert or others have to
> > say about this point before trying to provide such an API.
>
> I wouldn't bother. As KaiGai says, if you want to shut the
> functionality off, the provider itself can provide a GUC. Also, we
> really have made no effort to ensure that loadable modules can be
> safely unloaded, or hooked functions safely-unhooked.
> ExecutorRun_hook is a good example. Typical of hook installation is
> this:
>
> prev_ExecutorRun = ExecutorRun_hook;
> ExecutorRun_hook = pgss_ExecutorRun;
>
>
> On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I thought that in general if user has the API to register the custom path
> > methods, it should have some way to unregister them and also user might
> > need to register some different custom path methods after unregistering
> > the previous one's. I think we should see what Robert or others have to
> > say about this point before trying to provide such an API.
>
> I wouldn't bother. As KaiGai says, if you want to shut the
> functionality off, the provider itself can provide a GUC. Also, we
> really have made no effort to ensure that loadable modules can be
> safely unloaded, or hooked functions safely-unhooked.
> ExecutorRun_hook is a good example. Typical of hook installation is
> this:
>
> prev_ExecutorRun = ExecutorRun_hook;
> ExecutorRun_hook = pgss_ExecutorRun;
>
In this case, Extension takes care of register and unregister for
hook. In _PG_init(), it register the hook and _PG_fini() it
unregisters the same. So if for custom scan core pg is
providing API to register the methods, shouldn't it provide an
API to unregister the same?
On Tue, Nov 11, 2014 at 12:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Mon, Nov 10, 2014 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila <amit.kapila16@gmail.com> >> wrote: >> > I thought that in general if user has the API to register the custom >> > path >> > methods, it should have some way to unregister them and also user might >> > need to register some different custom path methods after unregistering >> > the previous one's. I think we should see what Robert or others have to >> > say about this point before trying to provide such an API. >> >> I wouldn't bother. As KaiGai says, if you want to shut the >> functionality off, the provider itself can provide a GUC. Also, we >> really have made no effort to ensure that loadable modules can be >> safely unloaded, or hooked functions safely-unhooked. >> ExecutorRun_hook is a good example. Typical of hook installation is >> this: >> >> prev_ExecutorRun = ExecutorRun_hook; >> ExecutorRun_hook = pgss_ExecutorRun; >> > > In this case, Extension takes care of register and unregister for > hook. In _PG_init(), it register the hook and _PG_fini() it > unregisters the same. The point is that there's nothing that you can do _PG_fini() that will work correctly. If it does ExecutorRun_hook = prev_ExecutorRun, that's fine if it's the most-recently-installed hook. But if it isn't, then doing so corrupts the list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I've committed parts 1 and 2 of this, without the documentation, and > with some additional cleanup. I am not sure that this feature is > sufficiently non-experimental that it deserves to be documented, but > if we're thinking of doing that then the documentation needs a lot > more work. I think part 3 of the patch is mostly useful as a > demonstration of how this API can be used, and is not something we > probably want to commit. So I'm not planning, at this point, to spend > any more time on this patch series, and will mark it Committed in the > CF app. I've done some preliminary cleanup on this patch, but I'm still pretty desperately unhappy about some aspects of it, in particular the way that it gets custom scan providers directly involved in setrefs.c, finalize_primnode, and replace_nestloop_params processing. I don't want any of that stuff exported outside the core, as freezing those APIs would be a very nasty restriction on future planner development. What's more, it doesn't seem like doing that creates any value for custom-scan providers, only a requirement for extra boilerplate code for them to provide. ISTM that we could avoid that by borrowing the design used for FDW plans, namely that any expressions you would like planner post-processing services for should be stuck into a predefined List field (fdw_exprs for the ForeignScan case, perhaps custom_exprs for the CustomScan case?). This would let us get rid of the SetCustomScanRef and FinalizeCustomScan callbacks as well as simplify the API contract for PlanCustomPath. I'm also wondering why this patch didn't follow the FDW lead in terms of expecting private data to be linked from specialized "private" fields. The design as it stands (with an expectation that CustomPaths, CustomPlans etc would be larger than the core code knows about) is not awful, but it seems just randomly different from the FDW precedent, and I don't see a good argument why it should be. If we undid that we could get rid of CopyCustomScan callbacks, and perhaps also TextOutCustomPath and TextOutCustomScan (though I concede there might be some argument to keep the latter two anyway for debugging reasons). Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism added to ruleutils.c is anything but dead weight that we'll have to maintain forever. It seems somewhat unlikely that anyone will figure out how to use it, or indeed that it can be used for anything very interesting. I suppose the argument for it is that you could stick "custom vars" into the tlist of a CustomScan plan node, but you cannot, at least not without a bunch of infrastructure that isn't there now; in particular how would such an expression ever get matched by setrefs.c to higher-level plan tlists? I think we should rip that out and wait to see a complete use-case before considering putting it back. Comments? regards, tom lane PS: with no documentation it's arguable that the entire patch is just dead weight. I'm not very happy that it went in without any.
On Thu, Nov 20, 2014 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've done some preliminary cleanup on this patch, but I'm still pretty > desperately unhappy about some aspects of it, in particular the way that > it gets custom scan providers directly involved in setrefs.c, > finalize_primnode, and replace_nestloop_params processing. I don't > want any of that stuff exported outside the core, as freezing those > APIs would be a very nasty restriction on future planner development. > What's more, it doesn't seem like doing that creates any value for > custom-scan providers, only a requirement for extra boilerplate code > for them to provide. > > ISTM that we could avoid that by borrowing the design used for FDW > plans, namely that any expressions you would like planner post-processing > services for should be stuck into a predefined List field (fdw_exprs > for the ForeignScan case, perhaps custom_exprs for the CustomScan case?). > This would let us get rid of the SetCustomScanRef and FinalizeCustomScan > callbacks as well as simplify the API contract for PlanCustomPath. Ah, that makes sense. I'm not sure I really understand what's so bad about the current system, but I have no issue with revising it for consistency. > I'm also wondering why this patch didn't follow the FDW lead in terms of > expecting private data to be linked from specialized "private" fields. > The design as it stands (with an expectation that CustomPaths, CustomPlans > etc would be larger than the core code knows about) is not awful, but it > seems just randomly different from the FDW precedent, and I don't see a > good argument why it should be. If we undid that we could get rid of > CopyCustomScan callbacks, and perhaps also TextOutCustomPath and > TextOutCustomScan (though I concede there might be some argument to keep > the latter two anyway for debugging reasons). OK. > Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism > added to ruleutils.c is anything but dead weight that we'll have to > maintain forever. It seems somewhat unlikely that anyone will figure > out how to use it, or indeed that it can be used for anything very > interesting. I suppose the argument for it is that you could stick > "custom vars" into the tlist of a CustomScan plan node, but you cannot, > at least not without a bunch of infrastructure that isn't there now; > in particular how would such an expression ever get matched by setrefs.c > to higher-level plan tlists? I think we should rip that out and wait > to see a complete use-case before considering putting it back. I thought this was driven by a suggestion from you, but maybe KaiGai can comment. > PS: with no documentation it's arguable that the entire patch is just > dead weight. I'm not very happy that it went in without any. As I said, I wasn't sure we wanted to commit to the API enough to document it, and by the time you get done whacking the stuff above around, the documentation KaiGai wrote for it (which was also badly in need of editing by a native English speaker) would have been mostly obsolete anyway. But I'm willing to put some effort into it once you get done rearranging the furniture, if that's helpful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> Robert Haas <robertmhaas@gmail.com> writes: > > I've committed parts 1 and 2 of this, without the documentation, and > > with some additional cleanup. I am not sure that this feature is > > sufficiently non-experimental that it deserves to be documented, but > > if we're thinking of doing that then the documentation needs a lot > > more work. I think part 3 of the patch is mostly useful as a > > demonstration of how this API can be used, and is not something we > > probably want to commit. So I'm not planning, at this point, to spend > > any more time on this patch series, and will mark it Committed in the > > CF app. > > I've done some preliminary cleanup on this patch, but I'm still pretty > desperately unhappy about some aspects of it, in particular the way that > it gets custom scan providers directly involved in setrefs.c, > finalize_primnode, and replace_nestloop_params processing. I don't want > any of that stuff exported outside the core, as freezing those APIs would > be a very nasty restriction on future planner development. > What's more, it doesn't seem like doing that creates any value for > custom-scan providers, only a requirement for extra boilerplate code for > them to provide. > > ISTM that we could avoid that by borrowing the design used for FDW plans, > namely that any expressions you would like planner post-processing services > for should be stuck into a predefined List field (fdw_exprs for the > ForeignScan case, perhaps custom_exprs for the CustomScan case?). > This would let us get rid of the SetCustomScanRef and FinalizeCustomScan > callbacks as well as simplify the API contract for PlanCustomPath. > If core backend can know which field contains expression nodes but processed by custom-scan provider, FinalizedCustomScan might be able to rid. However, rid of SetCustomScanRef makes unavailable a significant use case I intend. In case when tlist contains complicated expression node (thus it takes many cpu cycles) and custom-scan provider has a capability to compute this expression node externally, SetCustomScanRef hook allows to replace this complicate expression node by a simple Var node that references a value being externally computed. Because only custom-scan provider can know how this "pseudo" varnode is mapped to the original expression, it needs to call the hook to assign correct varno/varattno. We expect it assigns a special vano, like OUTER_VAR, and it is solved with GetSpecialCustomVar. One other idea is, core backend has a facility to translate relationship between the original expression and the pseudo varnode according to the map information given by custom-scan provider. > I'm also wondering why this patch didn't follow the FDW lead in terms of > expecting private data to be linked from specialized "private" fields. > The design as it stands (with an expectation that CustomPaths, CustomPlans > etc would be larger than the core code knows about) is not awful, but it > seems just randomly different from the FDW precedent, and I don't see a > good argument why it should be. If we undid that we could get rid of > CopyCustomScan callbacks, and perhaps also TextOutCustomPath and > TextOutCustomScan (though I concede there might be some argument to keep > the latter two anyway for debugging reasons). > Yep, its original proposition last year followed the FDW manner. It has custom_private field to store the private data of custom-scan provider, however, I was suggested to change the current form because it added a couple of routines to encode / decode Bitmapset that may lead other encode / decode routines for other data types. I'm neutral for this design choice. Either of them people accept is better for me. > Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism added > to ruleutils.c is anything but dead weight that we'll have to maintain > forever. It seems somewhat unlikely that anyone will figure out how to > use it, or indeed that it can be used for anything very interesting. I > suppose the argument for it is that you could stick "custom vars" into the > tlist of a CustomScan plan node, but you cannot, at least not without a > bunch of infrastructure that isn't there now; in particular how would such > an expression ever get matched by setrefs.c to higher-level plan tlists? > I think we should rip that out and wait to see a complete use-case before > considering putting it back. > As I described above, as long as core backend has a facility to manage the relationship between a pseudo varnode and complicated expression node, I think we can rid this callback. > PS: with no documentation it's arguable that the entire patch is just dead > weight. I'm not very happy that it went in without any. > I agree with this. Is it a good to write up a wikipage to brush up the documentation draft? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> On Thu, Nov 20, 2014 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I've done some preliminary cleanup on this patch, but I'm still pretty > > desperately unhappy about some aspects of it, in particular the way > > that it gets custom scan providers directly involved in setrefs.c, > > finalize_primnode, and replace_nestloop_params processing. I don't > > want any of that stuff exported outside the core, as freezing those > > APIs would be a very nasty restriction on future planner development. > > What's more, it doesn't seem like doing that creates any value for > > custom-scan providers, only a requirement for extra boilerplate code > > for them to provide. > > > > ISTM that we could avoid that by borrowing the design used for FDW > > plans, namely that any expressions you would like planner > > post-processing services for should be stuck into a predefined List > > field (fdw_exprs for the ForeignScan case, perhaps custom_exprs for the > CustomScan case?). > > This would let us get rid of the SetCustomScanRef and > > FinalizeCustomScan callbacks as well as simplify the API contract for > PlanCustomPath. > > Ah, that makes sense. I'm not sure I really understand what's so bad about > the current system, but I have no issue with revising it for consistency. > > > I'm also wondering why this patch didn't follow the FDW lead in terms > > of expecting private data to be linked from specialized "private" fields. > > The design as it stands (with an expectation that CustomPaths, > > CustomPlans etc would be larger than the core code knows about) is not > > awful, but it seems just randomly different from the FDW precedent, > > and I don't see a good argument why it should be. If we undid that we > > could get rid of CopyCustomScan callbacks, and perhaps also > > TextOutCustomPath and TextOutCustomScan (though I concede there might > > be some argument to keep the latter two anyway for debugging reasons). > > OK. > So, the existing form shall be revised as follows? * CustomScan shall not be a base type of custom data-type managed by extension, instead of private data field. * It also eliminates CopyCustomScan and TextOutCustomPath/Scan callback. * Expression nodes that will not be processed by core backend, but processed by extension shall be connected to special field,like fdw_exprs in FDW. * Translation between a pseudo varnode and original expression node shall be informed to the core backend, instead of SetCustomScanRefand GetSpecialCustomVar. > > Lastly, I'm pretty unconvinced that the GetSpecialCustomVar mechanism > > added to ruleutils.c is anything but dead weight that we'll have to > > maintain forever. It seems somewhat unlikely that anyone will figure > > out how to use it, or indeed that it can be used for anything very > > interesting. I suppose the argument for it is that you could stick > > "custom vars" into the tlist of a CustomScan plan node, but you > > cannot, at least not without a bunch of infrastructure that isn't > > there now; in particular how would such an expression ever get matched > > by setrefs.c to higher-level plan tlists? I think we should rip that > > out and wait to see a complete use-case before considering putting it > back. > > I thought this was driven by a suggestion from you, but maybe KaiGai can > comment. > > > PS: with no documentation it's arguable that the entire patch is just > > dead weight. I'm not very happy that it went in without any. > > As I said, I wasn't sure we wanted to commit to the API enough to document > it, and by the time you get done whacking the stuff above around, the > documentation KaiGai wrote for it (which was also badly in need of editing > by a native English speaker) would have been mostly obsolete anyway. But > I'm willing to put some effort into it once you get done rearranging the > furniture, if that's helpful. > For people's convenient, I'd like to set up a wikipage to write up a draft of SGML documentation for easy updates by native English speakers. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: >> I've done some preliminary cleanup on this patch, but I'm still pretty >> desperately unhappy about some aspects of it, in particular the way that >> it gets custom scan providers directly involved in setrefs.c, >> finalize_primnode, and replace_nestloop_params processing. I don't want >> any of that stuff exported outside the core, as freezing those APIs would >> be a very nasty restriction on future planner development. > If core backend can know which field contains expression nodes but > processed by custom-scan provider, FinalizedCustomScan might be able > to rid. However, rid of SetCustomScanRef makes unavailable a significant > use case I intend. > In case when tlist contains complicated expression node (thus it takes > many cpu cycles) and custom-scan provider has a capability to compute > this expression node externally, SetCustomScanRef hook allows to replace > this complicate expression node by a simple Var node that references > a value being externally computed. That's a fine goal to have, but this is not a solution that works for any except trivial cases. The problem is that that complicated expression isn't going to be in the CustomScan's tlist in the first place unless you have a one-node plan. As soon as you have a join, for example, the planner is going to delay calculation of anything more complex than a plain Var to above the join. Aggregation, GROUP BY, etc would also defeat such an optimization. This gets back to the remarks I made earlier about it not being possible to do anything very interesting in a plugin of this nature. You really need cooperation from other places in the planner if you want to do things like pushing down calculations into an external provider. And it's not at all clear how that would even work, let alone how we might make it implementable as a plugin rather than core code. Also, even if we could think of a way to do this from a CustomScan plugin, that would fail to cover some very significant use-cases for pushing down complex expressions, for example: * retrieving values of expensive functions from expression indexes; * pushing down expensive functions into FDWs so they can be done remotely. And I'm also worried that once we've exported and thereby frozen the APIs in this area, we'd be operating with one hand tied behind our backs in solving those use-cases. So I'm not very excited about pursuing the problem in this form. So I remain of the opinion that we should get the CustomScan stuff out of setrefs processing, and also that having EXPLAIN support for such special variables is premature. It's possible that after the dust settles we'd wind up with additions to ruleutils that look exactly like what's in this patch ... but I'd bet against that. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > As I said, I wasn't sure we wanted to commit to the API enough to > document it, and by the time you get done whacking the stuff above > around, the documentation KaiGai wrote for it (which was also badly in > need of editing by a native English speaker) would have been mostly > obsolete anyway. But I'm willing to put some effort into it once you > get done rearranging the furniture, if that's helpful. I thought of another API change we should consider. It's weird that CustomPathMethods includes CreateCustomScanPath, because that's not a method you apply to a CustomPath, it's what creates them in the first place. I'm inclined to think that we should get rid of that and register_custom_path_provider() altogether and just provide a function hook variable equivalent to create_customscan_paths, which providers can link into in the usual way. The register_custom_path_provider mechanism might have some use if we were also going to provide deregister-by-name functionality, but as you pointed out upthread, that's not likely to ever be worth doing. The hook function might better be named something like editorialize_on_relation_paths, since in principle it could screw around with the Paths already made by the core code, not just add CustomPaths. There's an analogy to get_relation_info_hook, which is meant to let plugins editorialize on the relation's index list. So maybe set_plain_rel_pathlist_hook? regards, tom lane
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > >> I've done some preliminary cleanup on this patch, but I'm still > >> pretty desperately unhappy about some aspects of it, in particular > >> the way that it gets custom scan providers directly involved in > >> setrefs.c, finalize_primnode, and replace_nestloop_params processing. > >> I don't want any of that stuff exported outside the core, as freezing > >> those APIs would be a very nasty restriction on future planner > development. > > > If core backend can know which field contains expression nodes but > > processed by custom-scan provider, FinalizedCustomScan might be able > > to rid. However, rid of SetCustomScanRef makes unavailable a > > significant use case I intend. > > In case when tlist contains complicated expression node (thus it takes > > many cpu cycles) and custom-scan provider has a capability to compute > > this expression node externally, SetCustomScanRef hook allows to > > replace this complicate expression node by a simple Var node that > > references a value being externally computed. > > That's a fine goal to have, but this is not a solution that works for any > except trivial cases. The problem is that that complicated expression > isn't going to be in the CustomScan's tlist in the first place unless you > have a one-node plan. As soon as you have a join, for example, the planner > is going to delay calculation of anything more complex than a plain Var > to above the join. Aggregation, GROUP BY, etc would also defeat such an > optimization. > > This gets back to the remarks I made earlier about it not being possible > to do anything very interesting in a plugin of this nature. You really > need cooperation from other places in the planner if you want to do things > like pushing down calculations into an external provider. And it's not > at all clear how that would even work, let alone how we might make it > implementable as a plugin rather than core code. > > Also, even if we could think of a way to do this from a CustomScan plugin, > that would fail to cover some very significant use-cases for pushing down > complex expressions, for example: > * retrieving values of expensive functions from expression indexes; > * pushing down expensive functions into FDWs so they can be done remotely. > And I'm also worried that once we've exported and thereby frozen the APIs > in this area, we'd be operating with one hand tied behind our backs in solving > those use-cases. So I'm not very excited about pursuing the problem in > this form. > I count understand your concern; only available on a one-node plan and may needs additional interaction between core and extension to push- down complicated expression. So, right now, I have to admit to rid of this hook for this purpose. On the other hand, I thought to use similar functionality, but not same, to implement join-replacement by custom-scan. I'd like to see your comment prior to patch submission. Let assume a custom-scan provider that runs on a materialized-view (or, something like a query cache in memory) instead of join. In this case, a reasonable design is to fetch a tuple from the materialized-view then put it on the ecxt_scantuple of ExprContext prior to evaluation of qualifier or tlist, unlike usual join takes two slots - ecxt_innertuple and ecxt_outertuple. Also, it leads individual varnode has to reference exct_scantuple, neither ecxt_innertuple nor ecxt_outertuple. The tuple in exct_scantuple contains attributes come from both relations, thus, it needs to keep relationship a varattno of the scanned tuple and its source relation where does it come from. I intended to use the SetCustomScanRef callback to adjust varno and varattno of the varnode that references the custom-scan node; as if set_join_references() doing. It does not mean a replacement of general expression by varnode, just re-mapping of varno/varattno. > So I remain of the opinion that we should get the CustomScan stuff out of > setrefs processing, and also that having EXPLAIN support for such special > variables is premature. It's possible that after the dust settles we'd > wind up with additions to ruleutils that look exactly like what's in this > patch ... but I'd bet against that. > So, I can agree with rid of SetCustomScanRef and GetSpecialCustomVar. However, some alternative functionality to implement the varno/varattno remapping is needed soon. How about your thought? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > Let assume a custom-scan provider that runs on a materialized-view > (or, something like a query cache in memory) instead of join. > In this case, a reasonable design is to fetch a tuple from the > materialized-view then put it on the ecxt_scantuple of ExprContext > prior to evaluation of qualifier or tlist, unlike usual join takes > two slots - ecxt_innertuple and ecxt_outertuple. > Also, it leads individual varnode has to reference exct_scantuple, > neither ecxt_innertuple nor ecxt_outertuple. OK, that's possibly a reasonable way to do it at runtime. You don't *have* to do it that way of course. It would be only marginally less efficient to reconstruct two tuples that match the shapes of the original join inputs. > I intended to use the SetCustomScanRef callback to adjust varno > and varattno of the varnode that references the custom-scan node; > as if set_join_references() doing. I think this is really fundamentally misguided. setrefs.c has no business doing anything "interesting" like making semantically important substitutions; those decisions need to be made much earlier. An example in the context of your previous proposal is that getting rid of expensive functions without any adjustment of cost estimates is just wrong; and I don't mean that you forgot to have your setrefs.c hook hack up the Plan's cost fields. The cost estimates need to change at the Path stage, or the planner might not even select the right path at all. I'm not sure where would be an appropriate place to deal with the kind of thing you're thinking about here. But I'm really not happy with the concept of exposing the guts of setrefs.c in order to enable single-purpose kluges like this. We have fairly general problems to solve in this area, and we should be working on solving them, not on freezing relevant planner APIs to support marginally-useful external plugins. regards, tom lane
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > > Let assume a custom-scan provider that runs on a materialized-view > > (or, something like a query cache in memory) instead of join. > > In this case, a reasonable design is to fetch a tuple from the > > materialized-view then put it on the ecxt_scantuple of ExprContext > > prior to evaluation of qualifier or tlist, unlike usual join takes two > > slots - ecxt_innertuple and ecxt_outertuple. > > Also, it leads individual varnode has to reference exct_scantuple, > > neither ecxt_innertuple nor ecxt_outertuple. > > OK, that's possibly a reasonable way to do it at runtime. You don't > *have* to do it that way of course. It would be only marginally less > efficient to reconstruct two tuples that match the shapes of the original > join inputs. > > > I intended to use the SetCustomScanRef callback to adjust varno and > > varattno of the varnode that references the custom-scan node; as if > > set_join_references() doing. > > I think this is really fundamentally misguided. setrefs.c has no business > doing anything "interesting" like making semantically important > substitutions; those decisions need to be made much earlier. An example > in the context of your previous proposal is that getting rid of expensive > functions without any adjustment of cost estimates is just wrong; and I > don't mean that you forgot to have your setrefs.c hook hack up the Plan's > cost fields. The cost estimates need to change at the Path stage, or the > planner might not even select the right path at all. > Because we right now have no functionality to register custom-scan path instead of join, I had to show another use scenario... > I'm not sure where would be an appropriate place to deal with the kind of > thing you're thinking about here. But I'm really not happy with the concept > of exposing the guts of setrefs.c in order to enable single-purpose kluges > like this. We have fairly general problems to solve in this area, and we > should be working on solving them, not on freezing relevant planner APIs > to support marginally-useful external plugins. > From my standpoint, varnode remapping on relations join is higher priority than complicated expression node. As long as the core backend handles this job, yes, I think a hook in setrefs.c is not mandatory. Also, it means the role to solve special vernode on EXPLAIN is moved from extension to the code, GetSpecialCustomVar can be rid. Let me explain the current idea of mine. CustomScan node will have a field that hold varnode mapping information that is constructed by custom-scan provider on create_customscan_plan, if they want. It is probably a list of varnode. If exists, setrefs.c changes its behavior; that updates varno/varattno of varnode according to this mapping, as if set_join_references() does based on indexed_tlist. To reference exct_scantuple, INDEX_VAR will be a best choice for varno of these varnodes, and index of the above varnode mapping list will be varattno. It can be utilized to make EXPLAIN output, instead of GetSpecialCustomVar hook. So, steps to go may be: (1) Add custom_private, custom_exprs, ... instead of self defined data type based on CustomXXX. (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current custom-"scan" support. (3) Integration of above varnode mapping feature within upcoming join replacement by custom-scan support. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > Let me explain the current idea of mine. > CustomScan node will have a field that hold varnode mapping information > that is constructed by custom-scan provider on create_customscan_plan, > if they want. It is probably a list of varnode. > If exists, setrefs.c changes its behavior; that updates varno/varattno of > varnode according to this mapping, as if set_join_references() does > based on indexed_tlist. > To reference exct_scantuple, INDEX_VAR will be a best choice for varno > of these varnodes, and index of the above varnode mapping list will > be varattno. It can be utilized to make EXPLAIN output, instead of > GetSpecialCustomVar hook. > So, steps to go may be: > (1) Add custom_private, custom_exprs, ... instead of self defined data > type based on CustomXXX. > (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current > custom-"scan" support. > (3) Integration of above varnode mapping feature within upcoming join > replacement by custom-scan support. Well ... I still do not find this interesting, because I don't believe that CustomScan is a solution to anything interesting. It's difficult enough to solve problems like expensive-function pushdown within the core code; why would we tie one hand behind our backs by insisting that they should be solved by extensions? And as I mentioned before, we do need solutions to these problems in the core, regardless of CustomScan. I think that a useful way to go at this might be to think first about how to make use of expensive functions that have been cached in indexes, and then see how the solution to that might translate to pushing down expensive functions into FDWs and CustomScans. If you start with the CustomScan aspect of it then you immediately find yourself trying to design APIs to divide up the solution, which is premature when you don't even know what the solution is. The rough idea I'd had about this is that while canvassing a relation's indexes (in get_relation_info), we could create a list of precomputed expressions that are available from indexes, then run through the query tree and replace any matching subexpressions with some Var-like nodes (or maybe better PlaceHolderVar-like nodes) that indicate that "we can get this expression for free if we read the right index". If we do read the right index, such an expression reduces to a Var in the finished plan tree; if not, it reverts to the original expression. (Some thought would need to be given to the semantics when the index's table is underneath an outer join --- that may just mean that we can't necessarily replace every textually-matching subexpression, only those that are not above an outer join.) One question mark here is how to do the "replace any matching subexpressions" bit without O(lots) processing cost in big queries. But that's probably just a SMOP. The bigger issue I fear is that the planner is not currently structured to think that evaluation cost of expressions in the SELECT list has anything to do with which Path it should pick. That is tied to the handwaving I've been doing for awhile now about converting all the upper-level planning logic into generate-and-compare-Paths style; we certainly cannot ignore tlist eval costs while making those decisions. So at least for those upper-level Paths, we'd have to have a notion of what tlist we expect that plan level to compute, and charge appropriate evaluation costs. So there's a lot of work there and I don't find that CustomScan looks like a solution to any of it. CustomScan and FDWs could benefit from this work, in that we'd now have a way to deal with the concept that expensive functions (and aggregates, I hope) might be computed at the bottom scan level. But it's folly to suppose that we can make it work just by hacking some arms-length extension code without any fundamental planner changes. regards, tom lane
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > > Let me explain the current idea of mine. > > CustomScan node will have a field that hold varnode mapping > > information that is constructed by custom-scan provider on > > create_customscan_plan, if they want. It is probably a list of varnode. > > If exists, setrefs.c changes its behavior; that updates varno/varattno > > of varnode according to this mapping, as if set_join_references() does > > based on indexed_tlist. > > To reference exct_scantuple, INDEX_VAR will be a best choice for varno > > of these varnodes, and index of the above varnode mapping list will be > > varattno. It can be utilized to make EXPLAIN output, instead of > > GetSpecialCustomVar hook. > > > So, steps to go may be: > > (1) Add custom_private, custom_exprs, ... instead of self defined data > > type based on CustomXXX. > > (2) Rid of SetCustomScanRef and GetSpecialCustomVar hook for the current > > custom-"scan" support. > > (3) Integration of above varnode mapping feature within upcoming join > > replacement by custom-scan support. > > Well ... I still do not find this interesting, because I don't believe that > CustomScan is a solution to anything interesting. It's difficult enough > to solve problems like expensive-function pushdown within the core code; > why would we tie one hand behind our backs by insisting that they should > be solved by extensions? And as I mentioned before, we do need solutions > to these problems in the core, regardless of CustomScan. > I'd like to split the "anything interesting" into two portions. As you pointed out, the feature to push-down complicated expression may need a bit large efforts (for remaining two commit-fest at least), however, what the feature to replace join by custom-scan requires is similar to job of set_join_references() because it never involves translation between varnode and general expression. Also, from my standpoint, a simple join replacement by custom-scan has higher priority; join acceleration in v9.5 makes sense even if full- functionality of pushing down general expression is not supported yet. > I think that a useful way to go at this might be to think first about how > to make use of expensive functions that have been cached in indexes, and > then see how the solution to that might translate to pushing down expensive > functions into FDWs and CustomScans. If you start with the CustomScan > aspect of it then you immediately find yourself trying to design APIs to > divide up the solution, which is premature when you don't even know what > the solution is. > Yep, it also seems to me remaining two commit fests are a bit tight schedule to make consensus of overall design and to implement. I'd like to focus on the simpler portion first. > The rough idea I'd had about this is that while canvassing a relation's > indexes (in get_relation_info), we could create a list of precomputed > expressions that are available from indexes, then run through the query > tree and replace any matching subexpressions with some Var-like nodes (or > maybe better PlaceHolderVar-like nodes) that indicate that "we can get this > expression for free if we read the right index". > If we do read the right index, such an expression reduces to a Var in the > finished plan tree; if not, it reverts to the original expression. > (Some thought would need to be given to the semantics when the index's table > is underneath an outer join --- that may just mean that we can't necessarily > replace every textually-matching subexpression, only those that are not > above an outer join.) One question mark here is how to do the "replace > any matching subexpressions" bit without O(lots) processing cost in big > queries. But that's probably just a SMOP. The bigger issue I fear is that > the planner is not currently structured to think that evaluation cost of > expressions in the SELECT list has anything to do with which Path it should > pick. That is tied to the handwaving I've been doing for awhile now about > converting all the upper-level planning logic into > generate-and-compare-Paths style; we certainly cannot ignore tlist eval > costs while making those decisions. So at least for those upper-level Paths, > we'd have to have a notion of what tlist we expect that plan level to compute, > and charge appropriate evaluation costs. > Let me investigate the planner code more prior to comment on... > So there's a lot of work there and I don't find that CustomScan looks like > a solution to any of it. CustomScan and FDWs could benefit from this work, > in that we'd now have a way to deal with the concept that expensive functions > (and aggregates, I hope) might be computed at the bottom scan level. But > it's folly to suppose that we can make it work just by hacking some > arms-length extension code without any fundamental planner changes. > Indeed, I don't think it is a good idea to start from this harder portion. Let's focus on just varno/varattno remapping to replace join relation by custom-scan, as an immediate target. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Nov 24, 2014 at 6:57 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Indeed, I don't think it is a good idea to start from this harder portion. > Let's focus on just varno/varattno remapping to replace join relation by > custom-scan, as an immediate target. We still need something like this for FDWs, as well. The potential gains there are enormous. Anything we do had better fit in nicely with that, rather than looking like a separate hack. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Mon, Nov 24, 2014 at 6:57 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Indeed, I don't think it is a good idea to start from this harder portion. > > Let's focus on just varno/varattno remapping to replace join relation > > by custom-scan, as an immediate target. > > We still need something like this for FDWs, as well. The potential gains > there are enormous. Anything we do had better fit in nicely with that, > rather than looking like a separate hack. > Today, I had a talk with Hanada-san to clarify which can be a common portion of them and how to implement it. Then, we concluded both of features can be shared most of the infrastructure. Let me put an introduction of join replacement by foreign-/custom-scan below. Its overall design intends to inject foreign-/custom-scan node instead of the built-in join logic (based on the estimated cost). From the viewpoint of core backend, it looks like a sub-query scan that contains relations join internally. What we need to do is below: (1) Add a hook add_paths_to_joinrel() It gives extensions (including FDW drivers and custom-scan providers) chance to add alternative paths towards a particular join of relations, using ForeignScanPath or CustomScanPath, if it can run instead of the built-in ones. (2) Informs the core backend varno/varattno mapping One thing we need to pay attention is, foreign-/custom-scan node that performs instead of the built-in join node must return mixture of values come from both relations. In case when FDW driver fetch a remote record (also, fetch a record computed by external computing resource), the most reasonable way is to store it on ecxt_scantuple of ExprContext, then kicks projection with varnode that references this slot. It needs an infrastructure that tracks relationship between original varnode and the alternative varno/varattno. We thought, it shall be mapped to INDEX_VAR and a virtual attribute number to reference ecxt_scantuple naturally, and this infrastructure is quite helpful for both of ForegnScan/CustomScan. We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan nodes. It contains list of the original Var node that shall be mapped on the position according to the list index. (e.g, the first varnode is varno=INDEX_VAR and varattno=1) (3) Reverse mapping on EXPLAIN For EXPLAIN support, above varnode on the pseudo relation scan needed to be solved. All we need to do is initialization of dpns->inner_tlist on set_deparse_planstate() according to the above mapping. (4) case of scanrelid == 0 To skip open/close (foreign) tables, we need to have a mark to introduce the backend not to initialize the scan node according to table definition, but according to the pseudo varnodes list. As earlier custom-scan patch doing, scanrelid == 0 is a straightforward mark to show the scan node is not combined with a particular real relation. So, it also need to add special case handling around foreign-/custom-scan code. We expect above changes are enough small to implement basic join push-down functionality (that does not involves external computing of complicated expression node), but valuable to support in v9.5. Please comment on the proposition above. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 7 November 2014 at 22:46, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >>> FYI, patch v12 part 2 no longer applies cleanly. >>> >> Thanks. I rebased the patch set according to the latest master branch. >> The attached v13 can be applied to the master. > > I've committed parts 1 and 2 of this, without the documentation, and > with some additional cleanup. I am not sure that this feature is > sufficiently non-experimental that it deserves to be documented, but > if we're thinking of doing that then the documentation needs a lot > more work. I think part 3 of the patch is mostly useful as a > demonstration of how this API can be used, and is not something we > probably want to commit. So I'm not planning, at this point, to spend > any more time on this patch series, and will mark it Committed in the > CF app. I'm very concerned about the state of this feature. No docs, no examples, and therefore, no testing. This standard of code is much less than I've been taught is the minimum standard on this project. There are zero docs, even in README. Experimental feature, or not, there MUST be documentation somewhere, in some form, even if that is just on the Wiki. Otherwise how it will ever be used sufficiently to allow it to be declared fully usable? The example contrib module was not committed and I am advised no longer works. After much effort in persuading academic contacts to begin using the feature for open source research it now appears pretty much unusable. This is supposed to be an open project. Whoever takes responsibility here, please ensure that those things are resolved, quickly. We're on a time limit because any flaws in the API need to be ironed out before its too late and we have to decide to either remove the API because its flaky, or commit to supporting it in production for 9.5. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> On 7 November 2014 at 22:46, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Oct 27, 2014 at 2:35 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > wrote: > >>> FYI, patch v12 part 2 no longer applies cleanly. > >>> > >> Thanks. I rebased the patch set according to the latest master branch. > >> The attached v13 can be applied to the master. > > > > I've committed parts 1 and 2 of this, without the documentation, and > > with some additional cleanup. I am not sure that this feature is > > sufficiently non-experimental that it deserves to be documented, but > > if we're thinking of doing that then the documentation needs a lot > > more work. I think part 3 of the patch is mostly useful as a > > demonstration of how this API can be used, and is not something we > > probably want to commit. So I'm not planning, at this point, to spend > > any more time on this patch series, and will mark it Committed in the > > CF app. > > > I'm very concerned about the state of this feature. No docs, no examples, > and therefore, no testing. This standard of code is much less than I've > been taught is the minimum standard on this project. > > There are zero docs, even in README. Experimental feature, or not, there > MUST be documentation somewhere, in some form, even if that is just on the > Wiki. Otherwise how it will ever be used sufficiently to allow it to be > declared fully usable? > The reason why documentation portion was not yet committed is, sorry, it is due to quality of documentation from the standpoint of native English speaker. Now, I'm writing up a documentation stuff according to the latest code base, please wait for several days and help to improve. > The example contrib module was not committed and I am advised no longer > works. > May I submit the contrib/ctidscan module again for an example? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The reason why documentation portion was not yet committed is, sorry, it > is due to quality of documentation from the standpoint of native English > speaker. > Now, I'm writing up a documentation stuff according to the latest code base, > please wait for several days and help to improve. Happy to help with that. Please post to the Wiki first so we can edit it communally. >> The example contrib module was not committed and I am advised no longer >> works. >> > May I submit the contrib/ctidscan module again for an example? Yes please. We have other contrib modules that exist as tests, so this seems reasonable to me. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Today, I had a talk with Hanada-san to clarify which can be a common portion > of them and how to implement it. Then, we concluded both of features can be > shared most of the infrastructure. > Let me put an introduction of join replacement by foreign-/custom-scan below. > > Its overall design intends to inject foreign-/custom-scan node instead of > the built-in join logic (based on the estimated cost). From the viewpoint of > core backend, it looks like a sub-query scan that contains relations join > internally. > > What we need to do is below: > > (1) Add a hook add_paths_to_joinrel() > It gives extensions (including FDW drivers and custom-scan providers) chance > to add alternative paths towards a particular join of relations, using > ForeignScanPath or CustomScanPath, if it can run instead of the built-in ones. > > (2) Informs the core backend varno/varattno mapping > One thing we need to pay attention is, foreign-/custom-scan node that performs > instead of the built-in join node must return mixture of values come from both > relations. In case when FDW driver fetch a remote record (also, fetch a record > computed by external computing resource), the most reasonable way is to store > it on ecxt_scantuple of ExprContext, then kicks projection with varnode that > references this slot. > It needs an infrastructure that tracks relationship between original varnode > and the alternative varno/varattno. We thought, it shall be mapped to INDEX_VAR > and a virtual attribute number to reference ecxt_scantuple naturally, and > this infrastructure is quite helpful for both of ForegnScan/CustomScan. > We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan nodes. > It contains list of the original Var node that shall be mapped on the position > according to the list index. (e.g, the first varnode is varno=INDEX_VAR and > varattno=1) > > (3) Reverse mapping on EXPLAIN > For EXPLAIN support, above varnode on the pseudo relation scan needed to be > solved. All we need to do is initialization of dpns->inner_tlist on > set_deparse_planstate() according to the above mapping. > > (4) case of scanrelid == 0 > To skip open/close (foreign) tables, we need to have a mark to introduce the > backend not to initialize the scan node according to table definition, but > according to the pseudo varnodes list. > As earlier custom-scan patch doing, scanrelid == 0 is a straightforward mark > to show the scan node is not combined with a particular real relation. > So, it also need to add special case handling around foreign-/custom-scan code. > > We expect above changes are enough small to implement basic join push-down > functionality (that does not involves external computing of complicated > expression node), but valuable to support in v9.5. > > Please comment on the proposition above. I don't really have any technical comments on this design right at the moment, but I think it's an important area where PostgreSQL needs to make some progress sooner rather than later, so I hope that we can get something committed in time for 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> -----Original Message----- > From: Simon Riggs [mailto:simon@2ndQuadrant.com] > Sent: Thursday, November 27, 2014 8:48 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru > Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter > Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > The reason why documentation portion was not yet committed is, sorry, > > it is due to quality of documentation from the standpoint of native > > English speaker. > > Now, I'm writing up a documentation stuff according to the latest code > > base, please wait for several days and help to improve. > > Happy to help with that. > > Please post to the Wiki first so we can edit it communally. > Simon, I tried to describe how custom-scan provider interact with the core backend, and expectations to individual callbacks here. https://wiki.postgresql.org/wiki/CustomScanInterface I'd like to see which kind of description should be added, from third person's viewpoint. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 27 November 2014 at 20:48, Simon Riggs <simon@2ndquadrant.com> wrote: > On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> The reason why documentation portion was not yet committed is, sorry, it >> is due to quality of documentation from the standpoint of native English >> speaker. >> Now, I'm writing up a documentation stuff according to the latest code base, >> please wait for several days and help to improve. > > Happy to help with that. > > Please post to the Wiki first so we can edit it communally. I've corrected a spelling mistake, but it reads OK at moment. >>> The example contrib module was not committed and I am advised no longer >>> works. >>> >> May I submit the contrib/ctidscan module again for an example? > > Yes please. We have other contrib modules that exist as tests, so this > seems reasonable to me. I can't improve the docs without the example code. Is that available now? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon, > > Yes please. We have other contrib modules that exist as tests, so this > > seems reasonable to me. > > I can't improve the docs without the example code. Is that available now? > Please wait for a few days. The ctidscan module is not adjusted for the latest interface yet. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Simon Riggs [mailto:simon@2ndQuadrant.com] > Sent: Sunday, December 07, 2014 12:37 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru > Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter > Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On 27 November 2014 at 20:48, Simon Riggs <simon@2ndquadrant.com> wrote: > > On 27 November 2014 at 10:33, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > >> The reason why documentation portion was not yet committed is, sorry, > >> it is due to quality of documentation from the standpoint of native > >> English speaker. > >> Now, I'm writing up a documentation stuff according to the latest > >> code base, please wait for several days and help to improve. > > > > Happy to help with that. > > > > Please post to the Wiki first so we can edit it communally. > > I've corrected a spelling mistake, but it reads OK at moment. > > > >>> The example contrib module was not committed and I am advised no > >>> longer works. > >>> > >> May I submit the contrib/ctidscan module again for an example? > > > > Yes please. We have other contrib modules that exist as tests, so this > > seems reasonable to me. > > I can't improve the docs without the example code. Is that available now? > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
On 12/6/14, 5:21 PM, Kouhei Kaigai wrote: >>> > >Yes please. We have other contrib modules that exist as tests, so this >>> > >seems reasonable to me. >> > >> >I can't improve the docs without the example code. Is that available now? >> > > Please wait for a few days. The ctidscan module is not adjusted for the > latest interface yet. I've made some minor edits, with an emphasis on not changing original intent. Each section was saved as a separate edit,so if anyone objects to something just revert the relevant change. Once the code is available more editing can be done. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 7 December 2014 at 08:21, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Please wait for a few days. The ctidscan module is not adjusted for the > latest interface yet. I am in many ways a patient man. At this point it is 12 days since my request for a working example. Feedback I am receiving is that the API is unusable. That could be because it is impenetrable, or because it is unusable. I'm not sure it matters which. We need a working example to ensure that the API meets the needs of a wide section of users and if it does not, to give other users a chance to request changes to the API so that it becomes usable. The window for such feedback is approaching zero very quickly now and we need action. Thanks -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon, The sample code is here: https://github.com/kaigai/ctidscan The code itself and regression tests shows how does it work and interact with the core backend. However, its source code comments are not updated and SGML document is not ready yet, because of my schedule in earlier half of December. I try to add the above stuff for a patch of contrib module, but will take a few more days. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Simon Riggs [mailto:simon@2ndQuadrant.com] > Sent: Tuesday, December 09, 2014 12:24 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Thom Brown; Kohei KaiGai; Tom Lane; Alvaro Herrera; Shigeru > Hanada; Stephen Frost; Andres Freund; PgHacker; Jim Mlodgenski; Peter > Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > On 7 December 2014 at 08:21, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > Please wait for a few days. The ctidscan module is not adjusted for > > the latest interface yet. > > I am in many ways a patient man. At this point it is 12 days since my request > for a working example. > > Feedback I am receiving is that the API is unusable. That could be because > it is impenetrable, or because it is unusable. I'm not sure it matters which. > > We need a working example to ensure that the API meets the needs of a wide > section of users and if it does not, to give other users a chance to request > changes to the API so that it becomes usable. The window for such feedback > is approaching zero very quickly now and we need action. > > Thanks > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 9, 2014 at 3:24 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Feedback I am receiving is that the API is unusable. That could be > because it is impenetrable, or because it is unusable. I'm not sure it > matters which. It would be nice to here what someone is trying to use it for and what problems that person is encountering. Without that, it's pretty much impossible for anyone to fix anything. As for sample code, KaiGai had a working example, which of course got broken when Tom changed the API, but it didn't look to me like Tom's changes would have made anything impossible that was possible before. I'm frankly kind of astonished by the tenor of this entire conversation; there is certainly plenty of code in the backend that is less self-documenting than this is; and KaiGai did already put up a wiki page with documentation as you requested. From his response, it sounds like he has updated the ctidscan code, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company