Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: [v9.5] Custom Plan API |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8FB8EAD@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | [v9.5] Custom Plan API (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: [v9.5] Custom Plan API
Re: [v9.5] Custom Plan API |
List | pgsql-hackers |
Hanada-san, The attached patch is revised one. Updates from the previous version are below: * System catalog name was changed to pg_custom_plan_provider; that reflects role of the object being defined. * Also, prefix of its variable names are changed to "cpp"; that means custom-plan-provider. * Syntax also reflects what the command does more. New syntax to define custom plan provider is: CREATE CUSTOM PLAN PROVIDER <cpp_name> FOR <cpp_class> HANDLER <cpp_function>; * Add custom-plan.sgml to introduce interface functions defined for path/plan/exec methods. * FinalizeCustomPlan() callback was simplified to support scan (and join in the future) at the starting point. As long as scan/join requirement, no need to control paramids bitmap in arbitrary way. * Unnecessary forward declaration in relation.h and plannode.h were removed, but a few structures still needs to have forward declarations. * Fix typos being pointed out. I'd like to see committer's suggestion regarding to the design issues below: * whether set_cheapest() is called for all relkind? ->according to the discussion in v9.4 cycle, I consolidated set_cheapest() in allpaths.c to set_rel_pathlist(). Hanada-san wonder whether it is necessary to have custom- plan on none base relations; like sub-query or values-scan. I don't have reason why not to run custom-plan on these non usual relations. * how argument of add_path handler shall be derivered? -> custom-plan handler function takes an argument with internal data type; that is a pointer of customScanArg if custom-plan class is "scan". (It shall be customHashJoinArg if "hash join" for example). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Friday, July 04, 2014 1:23 PM > To: 'Shigeru Hanada'; Kohei KaiGai > Cc: Simon Riggs; Tom Lane; Stephen Frost; Robert Haas; Andres Freund; > PgHacker; Jim Mlodgenski; Peter Eisentraut > Subject: Re: [HACKERS] [v9.5] Custom Plan API > > Hanada-san, > > Thanks for your dedicated reviewing. > > It's a very long message. So, let me summarize the things I shall do in > the next patch. > > * fix bug: custom-plan class comparison > * fix up naming convention and syntax > CREATE CUSTOM PLAN PROVIDER, rather than > CREATE CUSTOM PLAN. Prefix shall be "cpp_". > * fix up: definition of get_custom_plan_oid() > * fix up: unexpected white spaces, to be tabs. > * fix up: remove unnecessary forward declarations. > * fix up: revert replace_nestloop_params() to static > * make SetCustomPlanRef an optional callback > * fix up: typos in various points > * add documentation to explain custom-plan interface. > > Also, I want committer's opinion about the issues below > * whether set_cheapest() is called for all relkind? > * how argument of add_path handler shall be derivered? > > Individual comments are put below: > > > Kaigai-san, > > > > Sorry for lagged response. > > > > Here are my random thoughts about the patch. I couldn't understand > > the patch fully, because some of APIs are not used by ctidscan. If > > > > Custom Scan patch v2 review > > > > * Custom plan class comparison > > In backend/optimizer/util/pathnode.c, custclass is compared by bit-and > > with 's'. Do you plan to use custclass as bit field? If so, values > > for custom plan class should not be a character. Otherwise, custclass > > should be compared by == operator. > > > Sorry, it is a bug that come from the previous design. > I had an idea that allows a custom plan provider to support multiple kind > of exec nodes, however, I concluded it does not make sense so much. (we > can define multiple CPP for each) > > > * Purpose of GetSpecialCustomVar() > > The reason why FinalizeCustomPlan callback is necessary is not clear > > to me. > > Could you show a case that the API would be useful? > > > It is needed feature to replace a built-in join by custom scan, however, > it might be unclear on the scan workloads. > > Let me explain why join replacement needed. A join node has two input slot > (inner and outer), its expression node including Var node reference either > of slot according to its varno (INNER_VAR or OUTER_VAR). > In case when a CPP replaced a join, it has to generate an equivalent result > but it may not be a best choice to use two input streams. > (Please remind when we construct remote join on postgres_fdw, all the > materialization was done on remote side, thus we had one input stream to > generate local join equivalent view.) On the other hands, EXPLAIN command > has to understand what column is the source of varnodes in targetlist of > custom-node even if it is rewritten to use just one slot. For example, which > label shall be shown in case when 3rd item of targetlist is originally come > from 2nd item of inner slot but all the materialized result is stored into > outer slot. > Only CPP can track its relationship between the original and the newer one. > This interface provides a way to solve a varnode that actually references. > > > * Purpose of FinalizeCustomPlan() > > The reason why FinalizeCustomPlan callback is necessary is not clear > > to me, because ctidscan just calls finalize_primnode() and > > bms_add_members() with given information. Could you show a case that > > the API would be useful? > > > The main purpose of this callback gives an extension chance to apply > finalize_primenode() if custom-node hold expression tree on its private > fields. > In case when CPP picked up a part of clauses to run its own way, it shall > be attached on neither plan->targetlist nor plan->qual, only CPP knows where > does it attached. So, these orphan expression nodes have to be treated by > CPP. > > > * Is it ok to call set_cheapest() for all relkind? > > Now set_cheapest() is called not for only relation and foreign table > > but also custom plan, and other relations such as subquery, function, > and value. > > Calling call_custom_scan_provider() and set_cheapest() in the case of > > RTE_RELATION seems similar to the old construct, how do you think > > about this? > > > I don't think we may be actually able to have some useful custom scan logic > on these special relation forms, however, I also didn't have a special reason > why custom-plan does not need to support these special relations. > I'd like to see committer's opinion here. > > > > * Is it hard to get rid of CopyCustomPlan()? > > Copying ForeignScan node doesn't need per-FDW copy function by > > limiting fdw_private to have only copy-able objects. Can't we use the > > same way for CustomPlan? Letting authors call NodeSetTag or > > copyObject() sounds uncomfortable to me. > > > > This would be able to apply to TextOutCustomPlan() and > > TextOutCustomPath() too. > > > FDW-like design was the original one, but the latest design was suggestion > by Tom on the v9.4 development cycle, because some data types are not > complianced to copyObject; like Bitmapset. > > > * MultiExec support is appropriate for the first version? > > The cases need MultiExec seems little complex for the first version of > > custom scan. What kind of plan do you image for this feature? > > > It is definitely necessary to exchange multiple rows with custom-format > with upper level if both of nodes are managed by same CPP. > I plan to use this interface for bulk-loading that makes much faster data > loading to GPUs. > > > * Does SupportBackwardScan() have enough information? > > Other scans check target list with TargetListSupportsBackwardScan(). > > Isn't it necessary to check it for CustomPlan too in > > ExecSupportsBackwardScan()? > > > It derivers CustomPlan node itself that includes Plan node. > If CPP thought it is necessary, it can run equivalent checks here. > > > * Place to call custom plan provider > > Is it necessary to call provider when relkind != RELKIND_RELATION? If > > yes, isn't it necessary to call for append relation? > > > > I know that we concentrate to replacing scan in the initial version, > > so it would not be a serious problem, but it would be good to consider > > extensible design. > > > Regarding of the child relation scan, set_append_rel_pathlist() calls > set_rel_pathlist() that is entry point of custom-scan paths. > If you mention about alternative-path of Append node, yes, it is not a > feature being supported in the first commit. > > > * Custom Plan Provider is "addpath"? > > Passing addpath handler as only one attribute of CUSTOM PLAN PROVIDER > > seems little odd. > > Using handler like FDW makes the design too complex and/or messy? > > > This design allows to pass a set of information needed according to the > workload; like join not only scan. If we need to extend customXXXXArg in > the future, all we need to extend is data structure definition, not function > prototype itself. > Anyway, I'd like to make a decision for this on committer review stage. > > > * superclass of CustomPlanState > > CustomPlanState derives ScanState, instead of deriving PlanState > directly. > > I worry the case of non-heap-scan custom plan, but it might be ok to > > postpone consideration about that at the first cut. > > > We have some useful routines to implement custom-scan logic, but they takes > ScanState argument, like ExecScan(). > Even though we can copy it and paste to extension code, it is not a good > manner. > It takes three pointer variables in addition to PlanState. If CPP does not > take care about regular heap scan, keep them unused. It is quite helpful > if CPP implements some original logic on top of existing heap scan. > > > * Naming and granularity of objects related to custom plan I'm not > > sure the current naming is appropriate, especially difference between > > "custom plan" and "provider" and "handler". In the context of CREATE > > CUSTOM PLAN statement, what the term "custom plan" means? My > > impression is that "custom plan" is an alternative plan type, e.g. > > ctidscan or pg_strom_scan. Then what the term "provider" means? My > > impression for that is extension, such as ctidscan and pg_strom. The > > grammar allows users to pass function via PROVIDER clause of CREATE > > CUSTOM SCAN, so the function would be the provider of the custom plan > > created by the statement. > > > Hmm... What you want to say is, CREATE X statement is expected to create > X. > On the other hand, "custom-plan" is actually created by custom-plan provider, > not this DDL statement. The DDL statement defined custom-plan "provider". > I also think the suggestion is reasonable. > > How about the statement below instead? > > CREATE CUSTOM PLAN PROVIDER cpp_name FOR cpp_kind HANDLER cpp_function; > cpp_kind := SCAN (other types shall be supported later) > > > * enable_customscan > > GUC parameter enable_customscan would be useful for users who want to > > disable custom plan feature temporarily. In the case of pg_strom, > > using GPU for limited sessions for analytic or batch applications seems > handy. > > > It should be done by extension side individually. > Please imagine a user who install custom-GPU-scan, custom-matview-redirect > and custom-cache-only-scan. Purpose of each CPP are quite individually, > so I don't think enable_customscan makes sense. > > > * Adding pg_custom_plan catalog > > Using "cust" as prefix for pg_custom_plan causes ambiguousness which > > makes it difficult to choose catalog prefix for a feature named > > "Custom Foo" in future. How about using "cusp" (CUStom Plan)? > > > > Or is it better to use pg_custom_plan_provider as catalog relation > > name, as the document says that "CREATE CUSTOM PLAN defines custom plan > provider". > > Then prefix could be "cpp" (Custom Plan Provider). > > This seems to match the wording used for pg_foreign_data_wrapper. > > > My preference "cpp" as a shorten of custom plan provider. > > > > * CREATE CUSTOM PLAN statement > > This is just a question: We need to emit CREATE CUSTOM PLAN if we > > want to use I wonder how it is extended when supporting join as custom > class. > > > In case of join, I'll extend the syntax as follows: > > CREATE CUSTOM PLAN cppname > FOR [HASH JOIN|MERGE JOIN|NEST LOOP] > PROVIDER provider_func; > > Like customScanArg, we will define an argument type for each join methods > then provider_func shall be called with this argument. > I think it is well flexible and extendable approach. > > > * New operators about TID comparison > > IMO this portion should be a separated patch, because it adds OID > > definition of existing operators such as tidgt and tidle. Is there > > any (explicit or > > implicit) rule about defining macro for oid of an operator? > > > I don't know the general rules to define static OID definition. > Probably, these are added on demand. > > > * Prototype of get_custom_plan_oid() > > custname (or cppname if use the rule I proposed above) seems > > appropriate as the parameter name of get_custom_plan_oid() because > > similar funcitons use catalog column names in such case. > > > I'll rename it as follows: > > extern Oid get_custom_plan_provider_oid(const char *cpp_name, bool > missing_ok); > > > > * Coding conventions > > Some lines are indented with white space. Future pgindent run will > > fix this issue? > > > It's my oversight, to be fixed. > > > * Unnecessary struct forward declaration Forward declarations of > > CustomPathMethods, Plan, and CustomPlan in includes/nodes/relation.h > > seem unncecessary. Other headers might have same issue. > > > I'll check it. I had try & error during the development. It might leave > a dead code here. > > > * Unnecessary externing of replace_nestloop_params() > > replace_nestloop_params() is extern-ed but it's never called outside > > createplan.c. > > > Indeed, it's not needed until we support custom join logic. > > > * Externing fix_scan_expr() > > If it's necessary for all custom plan providers to call fix_scan_expr > > (via fix_scan_list macro), isn't it able to do it in set_plan_refs() > > before calling SetCustomPlanRef()? > > > One alternative idea is: > if scanrelid of custom-plan is valid (scanrelid > 0) and custom-node > has no private expression tree to be fixed up, CPP can have no > SetCustomPlanRef callback. In this case, core backend applies > fix_scan_list on the targetlist and qual, then adjust scanrelid. > > It was what I did in the previous revision, that was concerned by Tom because > it assumes too much things to the custom-node. (It is useful to only custom > "scan" node) > > > * What does T_CustomPlanMarkPos mean? > > It's not clear to me when CustomPlanMarkPos works. Is it for a custom > > plan provider which supports marking position and rewind to the > > position, and ctidscan just lacks capability to do that, so it is not > used anywhere? > > > Its previous design had a flag whether it allows backward scan, in the body > of CustomPlan structure. However, it makes a problem on > ExecSupportsMarkRestore() that takes only node-tag to determine whether > the supplied node support backward scan or not. > Once I tried to change ExecSupportsMarkRestore() to accept node body, then > Tom suggested to use a separated node tag instead. > > > > * Unnecessary changes in allpaths.c > > some comment about Subquery and CTE are changed (perhaps) accidentally. > > > No, it is intentional because set_cheapest() was consolidated. > > > * Typos > > * planenr -> planner, implements -> implement in > create_custom_plan.sgml > > * CustomScan in nodeCustom.h should be CustomPlan? > > * delivered -> derived, in src/backend/optimizer/util/pathnode.c > > > OK, I'll fix them. > > > * Document "Writing Custom Plan Provider" is not provided Custom Plan > > Provider author would (and I DO!) hope documents about writing a > > custom plan provider. > > > A documentation like fdwhandler.sgml, isn't it? > OK, I'll make it up. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > 2014-06-17 23:12 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > > Hanada-san, > > > > > > Thanks for your checks. I oversight the points when I submit the > > > patch, > > sorry. > > > The attached one is revised one on documentation stuff and > > contrib/Makefile. > > > > > > Thanks, > > > > > > 2014-06-16 17:29 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: > > >> Kaigai-san, > > >> > > >> I've just applied v1 patch, and tried build and install, but I > > >> found > > two issues: > > >> > > >> 1) The contrib/ctidscan is not automatically built/installed > > >> because it's not described in contrib/Makefile. Is this expected > behavior? > > >> 2) I got an error message below when building document. > > >> > > >> $ cd doc/src/sgml > > >> $ make > > >> openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . > > >> -d stylesheet.dsl -t sgml -i output-html -V html-index > > >> postgres.sgml > > >> openjade:catalogs.sgml:2525:45:X: reference to non-existent ID > > >> "SQL-CREATECUSTOMPLAN" > > >> make: *** [HTML.index] Error 1 > > >> make: *** Deleting file `HTML.index' > > >> > > >> I'll review another part of the patch, including the design. > > >> > > >> > > >> 2014-06-14 10:59 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > >>> According to the discussion upthread, I revised the custom-plan > > >>> patch to focus on regular relation scan but no join support right > > >>> now, and to support DDL command to define custom-plan providers. > > >>> > > >>> Planner integration with custom logic to scan a particular > > >>> relation is enough simple, unlike various join cases. It's almost > > >>> similar to what built-in logic are doing now - custom-plan > > >>> provider adds a path node with its cost estimation if it can offer > > >>> alternative way to scan referenced relation. (in case of no idea, > > >>> it does not need to add any paths) > > >>> > > >>> A new DDL syntax I'd like to propose is below: > > >>> > > >>> CREATE CUSTOM PLAN <name> FOR <class> PROVIDER <function_name>; > > >>> > > >>> <name> is as literal, put a unique identifier. > > >>> <class> is workload type to be offered by this custom-plan provider. > > >>> "scan" is the only option right now, that means base relation scan. > > >>> <function_name> is also as literal; it shall perform custom-plan > > provider. > > >>> > > >>> A custom-plan provider function is assumed to take an argument of > > >>> "internal" type to deliver a set of planner information that is > > >>> needed to construct custom-plan pathnode. > > >>> In case of "scan" class, pointer towards an customScanArg object > > >>> shall be delivered on invocation of custom-plan provider. > > >>> > > >>> typedef struct { > > >>> uint32 custom_class; > > >>> PlannerInfo *root; > > >>> RelOptInfo *baserel; > > >>> RangeTblEntry *rte; > > >>> } customScanArg; > > >>> > > >>> In case when the custom-plan provider function being invoked > > >>> thought it can offer an alternative scan path on the relation of > > >>> "baserel", things to do is (1) construct a CustomPath (or its > > >>> inherited data > > >>> type) object with a table of callback function pointers (2) put > > >>> its own cost estimation, and (3) call add_path() to register this > > >>> path as > > an alternative one. > > >>> > > >>> Once the custom-path was chosen by query planner, its > > >>> CreateCustomPlan callback is called to populate CustomPlan node > > >>> based > > on the pathnode. > > >>> It also has a table of callback function pointers to handle > > >>> various planner's job in setrefs.c and so on. > > >>> > > >>> Similarly, its CreateCustomPlanState callback is called to > > >>> populate CustomPlanState node based on the plannode. It also has a > > >>> table of callback function pointers to handle various executor's > > >>> job during quey execution. > > >>> > > >>> Most of callback designs are not changed from the prior > > >>> proposition in > > >>> v9.4 development cycle, however, here is a few changes. > > >>> > > >>> * CustomPlan became to inherit Scan, and CustomPlanState became to > > >>> inherit ScanState. Because some useful routines to implement scan- > > >>> logic, like ExecScan, expects state-node has ScanState as its base > > >>> type, it's more kindness for extension side. (I'd like to avoid > each > > >>> extension reinvent ExecScan by copy & paste!) > > >>> I'm not sure whether it should be a union of Join in the future, > > however, > > >>> it is a reasonable choice to have compatible layout with > > Scan/ScanState > > >>> to implement alternative "scan" logic. > > >>> > > >>> * Exporting static functions - I still don't have a graceful > > >>> answer > > here. > > >>> However, it is quite natural that extensions to follow up > > >>> interface > > updates > > >>> on the future version up of PostgreSQL. > > >>> Probably, it shall become clear what class of functions shall be > > >>> exported and what class of functions shall be re-implemented within > > >>> extension side in the later discussion. > > >>> Right now, I exported minimum ones that are needed to implement > > >>> alternative scan method - contrib/ctidscan module. > > >>> > > >>> Items to be discussed later: > > >>> * planner integration for relations join - probably, we may define > new > > >>> custom-plan classes as alternative of hash-join, merge-join and > > >>> nest-loop. If core can know this custom-plan is alternative of hash- > > >>> join, we can utilize core code to check legality of join. > > >>> * generic key-value style options in custom-plan definition - Hanada > > >>> san proposed me off-list - like foreign data wrapper. It may enable > > >>> to configure multiple behavior on a binary. > > >>> * ownership and access control of custom-plan. right now, only > > >>> superuser can create/drop custom-plan provider definition, thus, > > >>> it has no explicit ownership and access control. It seems to me > > >>> a reasonable assumption, however, we may have a usecase that > > >>> needs custom-plan by unprivileged users. > > >>> > > >>> Thanks, > > >>> > > >>> 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > >>>>> On 8 May 2014 22:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >>>>> > > >>>>> >> We're past the prototyping stage and into productionising > > >>>>> >> what we know works, AFAIK. If that point is not clear, then > > >>>>> >> we need to discuss that first. > > >>>>> > > > >>>>> > OK, I'll bite: what here do we know works? Not a damn thing > > >>>>> > AFAICS; it's all speculation that certain hooks might be > > >>>>> > useful, and speculation that's not supported by a lot of > > >>>>> > evidence. If you think this isn't prototyping, I wonder what > > >>>>> > you think *is* > > prototyping. > > >>>>> > > >>>>> My research contacts advise me of this recent work > > >>>>> http://www.ntu.edu.sg/home/bshe/hashjoinonapu_vldb13.pdf > > >>>>> and also that they expect a prototype to be ready by October, > > >>>>> which I have been told will be open source. > > >>>>> > > >>>>> So there are at least two groups looking at this as a serious > > >>>>> option for Postgres (not including the above paper's authors). > > >>>>> > > >>>>> That isn't *now*, but it is at least a time scale that fits with > > >>>>> acting on this in the next release, if we can separate out the > > >>>>> various ideas and agree we wish to proceed. > > >>>>> > > >>>>> I'll submerge again... > > >>>>> > > >>>> Through the discussion last week, our minimum consensus are: > > >>>> 1. Deregulated enhancement of FDW is not a way to go 2. > > >>>> Custom-path that can replace built-in scan makes sense as a first > step > > >>>> towards the future enhancement. Its planner integration is > > >>>> enough > > simple > > >>>> to do. > > >>>> 3. Custom-path that can replace built-in join takes investigation > > >>>> how > > to > > >>>> integrate existing planner structure, to avoid (3a) > > >>>> reinvention > > of > > >>>> whole of join handling in extension side, and (3b) unnecessary > > extension > > >>>> calls towards the case obviously unsupported. > > >>>> > > >>>> So, I'd like to start the (2) portion towards the upcoming 1st > > >>>> commit-fest on the v9.5 development cycle. Also, we will be able > > >>>> to have discussion for the (3) portion concurrently, probably, > > >>>> towards > > 2nd commit-fest. > > >>>> > > >>>> Unfortunately, I cannot participate PGcon/Ottawa this year. > > >>>> Please share us the face-to-face discussion here. > > >>>> > > >>>> Thanks, > > >>>> -- > > >>>> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > >>>> <kaigai@ak.jp.nec.com> > > >>>> > > >>> -- > > >>> KaiGai Kohei <kaigai@kaigai.gr.jp> > > >> > > >> > > >> > > >> -- > > >> Shigeru HANADA > > > > > > > > > > > > -- > > > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > > > > > -- > > Shigeru HANADA
Attachment
pgsql-hackers by date: