Thread: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
> On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > Today, I had a talk with Hanada-san to clarify which can be a common > > portion of them and how to implement it. Then, we concluded both of > > features can be shared most of the infrastructure. > > Let me put an introduction of join replacement by foreign-/custom-scan > below. > > > > Its overall design intends to inject foreign-/custom-scan node instead > > of the built-in join logic (based on the estimated cost). From the > > viewpoint of core backend, it looks like a sub-query scan that > > contains relations join internally. > > > > What we need to do is below: > > > > (1) Add a hook add_paths_to_joinrel() > > It gives extensions (including FDW drivers and custom-scan providers) > > chance to add alternative paths towards a particular join of > > relations, using ForeignScanPath or CustomScanPath, if it can run instead > of the built-in ones. > > > > (2) Informs the core backend varno/varattno mapping One thing we need > > to pay attention is, foreign-/custom-scan node that performs instead > > of the built-in join node must return mixture of values come from both > > relations. In case when FDW driver fetch a remote record (also, fetch > > a record computed by external computing resource), the most reasonable > > way is to store it on ecxt_scantuple of ExprContext, then kicks > > projection with varnode that references this slot. > > It needs an infrastructure that tracks relationship between original > > varnode and the alternative varno/varattno. We thought, it shall be > > mapped to INDEX_VAR and a virtual attribute number to reference > > ecxt_scantuple naturally, and this infrastructure is quite helpful for > both of ForegnScan/CustomScan. > > We'd like to add List *fdw_varmap/*custom_varmap variable to both of plan > nodes. > > It contains list of the original Var node that shall be mapped on the > > position according to the list index. (e.g, the first varnode is > > varno=INDEX_VAR and > > varattno=1) > > > > (3) Reverse mapping on EXPLAIN > > For EXPLAIN support, above varnode on the pseudo relation scan needed > > to be solved. All we need to do is initialization of dpns->inner_tlist > > on > > set_deparse_planstate() according to the above mapping. > > > > (4) case of scanrelid == 0 > > To skip open/close (foreign) tables, we need to have a mark to > > introduce the backend not to initialize the scan node according to > > table definition, but according to the pseudo varnodes list. > > As earlier custom-scan patch doing, scanrelid == 0 is a > > straightforward mark to show the scan node is not combined with a > particular real relation. > > So, it also need to add special case handling around foreign-/custom-scan > code. > > > > We expect above changes are enough small to implement basic join > > push-down functionality (that does not involves external computing of > > complicated expression node), but valuable to support in v9.5. > > > > Please comment on the proposition above. > > I don't really have any technical comments on this design right at the moment, > but I think it's an important area where PostgreSQL needs to make some > progress sooner rather than later, so I hope that we can get something > committed in time for 9.5. > I tried to implement the interface portion, as attached. Hanada-san may be under development of postgres_fdw based on this interface definition towards the next commit fest. Overall design of this patch is identical with what I described above. It intends to allow extensions (FDW driver or custom-scan provider) to replace a join by a foreign/custom-scan which internally contains a result set of relations join externally computed. It looks like a relation scan on the pseudo relation. One we need to pay attention is, how setrefs.c fixes up varno/varattno unlike regular join structure. I could find IndexOnlyScan already has similar infrastructure that redirect references of varnode to a certain column on ecxt_scantuple of ExprContext using a pair of INDEX_VAR and alternative varattno. This patch put a new field: fdw_ps_tlist of ForeignScan, and custom_ps_tlist of CustomScan. It is extension's role to set a pseudo- scan target-list (so, ps_tlist) of the foreign/custom-scan that replaced a join. If it is not NIL, set_plan_refs() takes another strategy to fix up them. It calls fix_upper_expr() to map varnodes of expression-list on INDEX_VAR according to the ps_tlist, then extension is expected to put values/isnull pair on ss_ScanTupleSlot of scan-state according to the ps_tlist preliminary constructed. Regarding to the primary hook to add alternative foreign/custom-scan path instead of built-in join paths, I added the following hook on add_paths_to_joinrel(). /* Hook for plugins to get control in add_paths_to_joinrel() */ typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, List *restrictlist, JoinType jointype, SpecialJoinInfo *sjinfo, SemiAntiJoinFactors *semifactors, Relids param_source_rels, Relids extra_lateral_rels); extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook; It shall give enough information for extensions to determine whether it can offer alternative paths, or not. One thing I concerned about is, fdw_handler to be called on joinrel is not obvious, unlike custom-scan that hold reference to CustomScanMethods, because joinrel is not managed by any FDW drivers. So, I had to add "Oid fdw_handler" field onto RelOptInfo to track which foreign-tables are involved in this relation join. This field shall have oid of valid FDW handler if both inner/outer relation is managed by same FDW handler. Elsewhere, InvalidOid. Even if either/both of them are relations-join, fdw_handler shall be set as long as it is managed by same FDW handler. It allows to replace join by foreign-scan that involves more than two tables. One new interface contract is case of scanrelid == 0. If foreign-/custom- scan is not associated with a particular relation, ExecInitXXX() tries to initialize ss_ScanTupleSlot according to the ps_tlist, and relations is not opened. Because the working example is still under development, this patch is not tested/validated yet. However, it briefly implements the concept of what we'd like to enhance foreign-/custom-scan functionality. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Hello, The attached patch is newer revision of custom-/foreign-join interface. I've ported my PG-Strom extension to fit the latest custom-scan (+this patch) interface in this winter vacation. The concept of "join replaced by foreign-/custom-scan" almost works well, however, here are two small oversight on the v1 interface. 1. EXPLAIN didn't work when scanrelid==0. ExplainNode() always called ExplainScanTarget() to T_ForeignScan or T_CustomScan, however, foreign-/custom-scan node that replaced join relation does not have a particular base relation. So, I put a check to skip this call when scanrelid==0. 2. create_plan_recurse() needs to be available from extension. In case when CustomScan node takes underlying plan nodes, its PlanCustomPath() method is also responsible to invoke the plan creation routine of the underlying path-node. However, existing code declared create_plan_recurse() as static function. So, this patch re-declared it as external function. Also, one other point I'd like to have in this interface. In case when foreign-/custom-scan node has pseudo-scan targetlist, it may contain the target-entries which are not actually in use, but need to be here to lookup column name on EXPLAIN command. I'd like to add a flag to indicate the core backend to ignore target-entries in the pseudo-scan tlist if resjunk=true, when it initialized the foreign-/custom-scan-state node, and setting up scan type descriptor. It will reduce unnecessary projection, if foreign-/custom-scan node can produce a tuple based on the expectation of tlist. I'd like to see the comment around this point. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Wednesday, December 03, 2014 3:11 PM > To: Robert Haas > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > On Tue, Nov 25, 2014 at 3:44 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > wrote: > > > Today, I had a talk with Hanada-san to clarify which can be a common > > > portion of them and how to implement it. Then, we concluded both of > > > features can be shared most of the infrastructure. > > > Let me put an introduction of join replacement by > > > foreign-/custom-scan > > below. > > > > > > Its overall design intends to inject foreign-/custom-scan node > > > instead of the built-in join logic (based on the estimated cost). > > > From the viewpoint of core backend, it looks like a sub-query scan > > > that contains relations join internally. > > > > > > What we need to do is below: > > > > > > (1) Add a hook add_paths_to_joinrel() It gives extensions (including > > > FDW drivers and custom-scan providers) chance to add alternative > > > paths towards a particular join of relations, using ForeignScanPath > > > or CustomScanPath, if it can run instead > > of the built-in ones. > > > > > > (2) Informs the core backend varno/varattno mapping One thing we > > > need to pay attention is, foreign-/custom-scan node that performs > > > instead of the built-in join node must return mixture of values come > > > from both relations. In case when FDW driver fetch a remote record > > > (also, fetch a record computed by external computing resource), the > > > most reasonable way is to store it on ecxt_scantuple of ExprContext, > > > then kicks projection with varnode that references this slot. > > > It needs an infrastructure that tracks relationship between original > > > varnode and the alternative varno/varattno. We thought, it shall be > > > mapped to INDEX_VAR and a virtual attribute number to reference > > > ecxt_scantuple naturally, and this infrastructure is quite helpful > > > for > > both of ForegnScan/CustomScan. > > > We'd like to add List *fdw_varmap/*custom_varmap variable to both of > > > plan > > nodes. > > > It contains list of the original Var node that shall be mapped on > > > the position according to the list index. (e.g, the first varnode is > > > varno=INDEX_VAR and > > > varattno=1) > > > > > > (3) Reverse mapping on EXPLAIN > > > For EXPLAIN support, above varnode on the pseudo relation scan > > > needed to be solved. All we need to do is initialization of > > > dpns->inner_tlist on > > > set_deparse_planstate() according to the above mapping. > > > > > > (4) case of scanrelid == 0 > > > To skip open/close (foreign) tables, we need to have a mark to > > > introduce the backend not to initialize the scan node according to > > > table definition, but according to the pseudo varnodes list. > > > As earlier custom-scan patch doing, scanrelid == 0 is a > > > straightforward mark to show the scan node is not combined with a > > particular real relation. > > > So, it also need to add special case handling around > > > foreign-/custom-scan > > code. > > > > > > We expect above changes are enough small to implement basic join > > > push-down functionality (that does not involves external computing > > > of complicated expression node), but valuable to support in v9.5. > > > > > > Please comment on the proposition above. > > > > I don't really have any technical comments on this design right at the > > moment, but I think it's an important area where PostgreSQL needs to > > make some progress sooner rather than later, so I hope that we can get > > something committed in time for 9.5. > > > I tried to implement the interface portion, as attached. > Hanada-san may be under development of postgres_fdw based on this interface > definition towards the next commit fest. > > Overall design of this patch is identical with what I described above. > It intends to allow extensions (FDW driver or custom-scan provider) to > replace a join by a foreign/custom-scan which internally contains a result > set of relations join externally computed. It looks like a relation scan > on the pseudo relation. > > One we need to pay attention is, how setrefs.c fixes up varno/varattno unlike > regular join structure. I could find IndexOnlyScan already has similar > infrastructure that redirect references of varnode to a certain column on > ecxt_scantuple of ExprContext using a pair of INDEX_VAR and alternative > varattno. > > This patch put a new field: fdw_ps_tlist of ForeignScan, and custom_ps_tlist > of CustomScan. It is extension's role to set a pseudo- scan target-list > (so, ps_tlist) of the foreign/custom-scan that replaced a join. > If it is not NIL, set_plan_refs() takes another strategy to fix up them. > It calls fix_upper_expr() to map varnodes of expression-list on INDEX_VAR > according to the ps_tlist, then extension is expected to put values/isnull > pair on ss_ScanTupleSlot of scan-state according to the ps_tlist > preliminary constructed. > > Regarding to the primary hook to add alternative foreign/custom-scan path > instead of built-in join paths, I added the following hook on > add_paths_to_joinrel(). > > /* Hook for plugins to get control in add_paths_to_joinrel() */ > typedef void (*set_join_pathlist_hook_type) (PlannerInfo *root, > RelOptInfo *joinrel, > RelOptInfo *outerrel, > RelOptInfo *innerrel, > List *restrictlist, > JoinType jointype, > SpecialJoinInfo *sjinfo, > SemiAntiJoinFactors > *semifactors, > Relids > param_source_rels, > Relids > extra_lateral_rels); > extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook; > > It shall give enough information for extensions to determine whether it > can offer alternative paths, or not. > > One thing I concerned about is, fdw_handler to be called on joinrel is not > obvious, unlike custom-scan that hold reference to CustomScanMethods, > because joinrel is not managed by any FDW drivers. > So, I had to add "Oid fdw_handler" field onto RelOptInfo to track which > foreign-tables are involved in this relation join. This field shall have > oid of valid FDW handler if both inner/outer relation is managed by same > FDW handler. Elsewhere, InvalidOid. Even if either/both of them are > relations-join, fdw_handler shall be set as long as it is managed by same > FDW handler. It allows to replace join by foreign-scan that involves more > than two tables. > > One new interface contract is case of scanrelid == 0. If foreign-/custom- > scan is not associated with a particular relation, ExecInitXXX() tries to > initialize ss_ScanTupleSlot according to the ps_tlist, and relations is > not opened. > > Because the working example is still under development, this patch is not > tested/validated yet. However, it briefly implements the concept of what > we'd like to enhance foreign-/custom-scan functionality. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com>
Attachment
On 1/6/15, 8:17 AM, Kouhei Kaigai wrote: > The attached patch is newer revision of custom-/foreign-join > interface. Shouldn't instances of scan_relid > 0 be scan_relid != InvalidOid ? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 07/01/15 00:05, Jim Nasby wrote: > On 1/6/15, 8:17 AM, Kouhei Kaigai wrote: >> The attached patch is newer revision of custom-/foreign-join >> interface. > > Shouldn't instances of > > scan_relid > 0 > > be > > scan_relid != InvalidOid > Ideally, they should be OidIsValid(scan_relid) -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> > scan_relid != InvalidOid > > > > Ideally, they should be OidIsValid(scan_relid) > Scan.scanrelid is an index of range-tables list, not an object-id. So, InvalidOid or OidIsValid() are not a good choice. The bare relation oid has to be saved on relid of RangeTblEntry which can be pulled using rt_fetch(scanrelid, range_tables). I could found an assertion below at ExecScanFetch(). Assert(scanrelid > 0); Probably, it is a usual manner for this. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Petr Jelinek [mailto:petr@2ndquadrant.com] > Sent: Wednesday, January 07, 2015 8:24 AM > To: Jim Nasby; Kaigai Kouhei(海外 浩平); Robert Haas > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > API) > > On 07/01/15 00:05, Jim Nasby wrote: > > On 1/6/15, 8:17 AM, Kouhei Kaigai wrote: > >> The attached patch is newer revision of custom-/foreign-join > >> interface. > > > > Shouldn't instances of > > > > scan_relid > 0 > > > > be > > > > scan_relid != InvalidOid > > > > Ideally, they should be OidIsValid(scan_relid) > > > -- > Petr Jelinek http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Jan 6, 2015 at 9:17 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch is newer revision of custom-/foreign-join > interface. It seems that the basic purpose of this patch is to allow a foreign scan or custom scan to have scanrelid == 0, reflecting the case where we are scanning a joinrel rather than a baserel. The major problem that seems to create is that we can't set the target list from the relation descriptor, because there isn't one. To work around that, you've added fdw_ps_list and custom_ps_tlist, which the FDW or custom-plan provider must set. I don't know off-hand whether that's a good interface or not. How does the FDW know what to stick in there? There's a comment that seems to be trying to explain this: + * An optional fdw_ps_tlist is used to map a reference to an attribute of + * underlying relation(s) on a pair of INDEX_VAR and alternative varattno. + * It looks like a scan on pseudo relation that is usually result of + * relations join on remote data source, and FDW driver is responsible to + * set expected target list for this. If FDW returns records as foreign- + * table definition, just put NIL here. ...but I can't understand what that's telling me. You've added an "Oid fdw_handler" field to the ForeignScan and RelOptInfo structures. I think this is the OID of the pg_proc entry for the handler function; and I think we need it because, if scanrelid == 0 then we don't have a relation that we can trace to a foreign table, to a server, to an FDW, and then to a handler. So we need to get that information some other way. When building joinrels, the fdw_handler OID, and the associated routine, are propagated from any two relations that share the same fdw_handler OID to the resulting joinrel. I guess that's reasonable, although it feels a little weird that we're copying around both the OID and the structure-pointer. For non-obvious reasons, you've made create_plan_recurse() non-static. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Tue, Jan 6, 2015 at 9:17 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > The attached patch is newer revision of custom-/foreign-join > > interface. > > It seems that the basic purpose of this patch is to allow a foreign scan > or custom scan to have scanrelid == 0, reflecting the case where we are > scanning a joinrel rather than a baserel. The major problem that seems > to create is that we can't set the target list from the relation descriptor, > because there isn't one. To work around that, you've added fdw_ps_list > and custom_ps_tlist, which the FDW or custom-plan provider must set. I > don't know off-hand whether that's a good interface or not. How does the > FDW know what to stick in there? > In the most usual scenario, FDP/CSP will make a ps_tlist according to the target-list of the joinrel (that contains mixture of var-nodes to left-side and right-side), and qualifier's expression tree if any. As long as FDW can construct a remote query, it knows which attributes shall be returned and which relation does it come from. It is equivalent to what ps_tlist tries to inform the core optimizer. > There's a comment that seems to be trying to explain this: > > + * An optional fdw_ps_tlist is used to map a reference to an attribute > + of > + * underlying relation(s) on a pair of INDEX_VAR and alternative varattno. > + * It looks like a scan on pseudo relation that is usually result of > + * relations join on remote data source, and FDW driver is responsible > + to > + * set expected target list for this. If FDW returns records as > + foreign- > + * table definition, just put NIL here. > > ...but I can't understand what that's telling me. > Sorry, let me explain in another expression. A joinrel has a target-list that can/may contain references to both of the left and right relations. These are eventually mapped to either INNER_VAR or OUTER_VAR, then executor switch the TupleTableSlot (whether ecxt_innertuple or ecxt_outertuple) according to the special varno. On the other hands, because ForeignScan/CustomScan is a scan plan, it shall have just one TupleTableSlot on execution time. Thus, we need a mechanism that maps attributes from both of the relations on a certain location of the slot; that shall be eventually translated to var-node with INDEX_VAR to reference ecxt_scantuple. Of course, ps_tlist is not necessary if ForeignScan/CustomScan scans on a base relation as literal. In this case, the interface contract expects NIL is set on the ps_tlist field. > You've added an "Oid fdw_handler" field to the ForeignScan and RelOptInfo > structures. I think this is the OID of the pg_proc entry for the handler > function; and I think we need it because, if scanrelid == 0 then we don't > have a relation that we can trace to a foreign table, to a server, to an > FDW, and then to a handler. So we need to get that information some other > way. When building joinrels, the fdw_handler OID, and the associated > routine, are propagated from any two relations that share the same > fdw_handler OID to the resulting joinrel. I guess that's reasonable, > although it feels a little weird that we're copying around both the OID > and the structure-pointer. > Unlike CustomScan node, ForeignScan node does not have function pointers. In addition, it is dynamically allocated by palloc(), so we have no guarantee the pointer constructed on plan-stage is valid on beginning of the executor. It is the reason why I put OID of the FDW handler routine. Any other better idea? > For non-obvious reasons, you've made create_plan_recurse() non-static. > When custom-scan node replaced a join-plan, it shall have at least two child plan-nodes. The callback handler of PlanCustomPath needs to be able to call create_plan_recurse() to transform the underlying path-nodes to plan-nodes, because this custom-scan node may take other built-in scan or sub-join nodes as its inner/outer input. In case of FDW, it shall kick any underlying scan relations to remote side, thus we may not expect ForeignScan has underlying plans... Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 1/6/15, 5:43 PM, Kouhei Kaigai wrote: >>> scan_relid != InvalidOid >>> > > >> > >> >Ideally, they should be OidIsValid(scan_relid) >> > > Scan.scanrelid is an index of range-tables list, not an object-id. > So, InvalidOid or OidIsValid() are not a good choice. I think the name needs to change then; scan_relid certainly looks like the OID of a relation. scan_index? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-10 8:18 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: > On 1/6/15, 5:43 PM, Kouhei Kaigai wrote: >>>> >>>> scan_relid != InvalidOid >>>> > > >>> >>> > >>> >Ideally, they should be OidIsValid(scan_relid) >>> > >> >> Scan.scanrelid is an index of range-tables list, not an object-id. >> So, InvalidOid or OidIsValid() are not a good choice. > > > I think the name needs to change then; scan_relid certainly looks like the > OID of a relation. > > scan_index? > Yep, I had a same impression when I looked at the code first time, however, it is defined as below. Not a manner of custom-scan itself. /** ==========* Scan nodes* ==========*/ typedef struct Scan { Plan plan; Index scanrelid; /* relid is index into the range table */ } Scan; -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On 10/01/15 01:19, Kohei KaiGai wrote: > 2015-01-10 8:18 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: >> On 1/6/15, 5:43 PM, Kouhei Kaigai wrote: >>>>> >>>>> scan_relid != InvalidOid >>>>>>> >>>> >>>>> >>>>> Ideally, they should be OidIsValid(scan_relid) >>>>> >>> >>> Scan.scanrelid is an index of range-tables list, not an object-id. >>> So, InvalidOid or OidIsValid() are not a good choice. >> >> >> I think the name needs to change then; scan_relid certainly looks like the >> OID of a relation. >> >> scan_index? >> > Yep, I had a same impression when I looked at the code first time, > however, it is defined as below. Not a manner of custom-scan itself. > > /* > * ========== > * Scan nodes > * ========== > */ > typedef struct Scan > { > Plan plan; > Index scanrelid; /* relid is index into the range table */ > } Scan; > Yeah there are actually several places in the code where "relid" means index in range table and not oid of relation, it still manages to confuse me. Nothing this patch can do about that. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 1/9/15, 6:44 PM, Petr Jelinek wrote: >>> >> Yep, I had a same impression when I looked at the code first time, >> however, it is defined as below. Not a manner of custom-scan itself. >> >> /* >> * ========== >> * Scan nodes >> * ========== >> */ >> typedef struct Scan >> { >> Plan plan; >> Index scanrelid; /* relid is index into the range table */ >> } Scan; >> > > Yeah there are actually several places in the code where "relid" means index in range table and not oid of relation, itstill manages to confuse me. Nothing this patch can do about that. Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don'twant to waste time if it'll just be rejected. Any other examples of this I should fix too? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 1/9/15, 6:54 PM, Jim Nasby wrote: > On 1/9/15, 6:44 PM, Petr Jelinek wrote: >>>> >>> Yep, I had a same impression when I looked at the code first time, >>> however, it is defined as below. Not a manner of custom-scan itself. >>> >>> /* >>> * ========== >>> * Scan nodes >>> * ========== >>> */ >>> typedef struct Scan >>> { >>> Plan plan; >>> Index scanrelid; /* relid is index into the range table */ >>> } Scan; >>> >> >> Yeah there are actually several places in the code where "relid" means index in range table and not oid of relation, itstill manages to confuse me. Nothing this patch can do about that. > > Well, since it's confused 3 of us now... should we change it (as a separate patch)? I'm willing to do that work but don'twant to waste time if it'll just be rejected. > > Any other examples of this I should fix too? Sorry, to clarify... any other items besides Scan.scanrelid that I should fix? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-10 9:56 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: > On 1/9/15, 6:54 PM, Jim Nasby wrote: >> >> On 1/9/15, 6:44 PM, Petr Jelinek wrote: >>>>> >>>>> >>>> Yep, I had a same impression when I looked at the code first time, >>>> however, it is defined as below. Not a manner of custom-scan itself. >>>> >>>> /* >>>> * ========== >>>> * Scan nodes >>>> * ========== >>>> */ >>>> typedef struct Scan >>>> { >>>> Plan plan; >>>> Index scanrelid; /* relid is index into the range table >>>> */ >>>> } Scan; >>>> >>> >>> Yeah there are actually several places in the code where "relid" means >>> index in range table and not oid of relation, it still manages to confuse >>> me. Nothing this patch can do about that. >> >> >> Well, since it's confused 3 of us now... should we change it (as a >> separate patch)? I'm willing to do that work but don't want to waste time if >> it'll just be rejected. >> >> Any other examples of this I should fix too? > > > Sorry, to clarify... any other items besides Scan.scanrelid that I should > fix? > This naming is a little bit confusing, however, I don't think it "should" be changed because this structure has been used for a long time, so reworking will prevent back-patching when we find bugs around "scanrelid". Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On 1/9/15, 8:51 PM, Kohei KaiGai wrote: > 2015-01-10 9:56 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: >> On 1/9/15, 6:54 PM, Jim Nasby wrote: >>> >>> On 1/9/15, 6:44 PM, Petr Jelinek wrote: >>>>>> >>>>>> >>>>> Yep, I had a same impression when I looked at the code first time, >>>>> however, it is defined as below. Not a manner of custom-scan itself. >>>>> >>>>> /* >>>>> * ========== >>>>> * Scan nodes >>>>> * ========== >>>>> */ >>>>> typedef struct Scan >>>>> { >>>>> Plan plan; >>>>> Index scanrelid; /* relid is index into the range table >>>>> */ >>>>> } Scan; >>>>> >>>> >>>> Yeah there are actually several places in the code where "relid" means >>>> index in range table and not oid of relation, it still manages to confuse >>>> me. Nothing this patch can do about that. >>> >>> >>> Well, since it's confused 3 of us now... should we change it (as a >>> separate patch)? I'm willing to do that work but don't want to waste time if >>> it'll just be rejected. >>> >>> Any other examples of this I should fix too? >> >> >> Sorry, to clarify... any other items besides Scan.scanrelid that I should >> fix? >> > This naming is a little bit confusing, however, I don't think it "should" be > changed because this structure has been used for a long time, so reworking > will prevent back-patching when we find bugs around "scanrelid". We can still backpatch; it just requires more work. And how many bugs do we actually expect to find around this anyway? If folks think this just isn't worth fixing fine, but I find the backpatching argument rather dubious. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
2015-01-11 10:40 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: > On 1/9/15, 8:51 PM, Kohei KaiGai wrote: >> >> 2015-01-10 9:56 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: >>> >>> On 1/9/15, 6:54 PM, Jim Nasby wrote: >>>> >>>> >>>> On 1/9/15, 6:44 PM, Petr Jelinek wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>> Yep, I had a same impression when I looked at the code first time, >>>>>> however, it is defined as below. Not a manner of custom-scan itself. >>>>>> >>>>>> /* >>>>>> * ========== >>>>>> * Scan nodes >>>>>> * ========== >>>>>> */ >>>>>> typedef struct Scan >>>>>> { >>>>>> Plan plan; >>>>>> Index scanrelid; /* relid is index into the range >>>>>> table >>>>>> */ >>>>>> } Scan; >>>>>> >>>>> >>>>> Yeah there are actually several places in the code where "relid" means >>>>> index in range table and not oid of relation, it still manages to >>>>> confuse >>>>> me. Nothing this patch can do about that. >>>> >>>> >>>> >>>> Well, since it's confused 3 of us now... should we change it (as a >>>> separate patch)? I'm willing to do that work but don't want to waste >>>> time if >>>> it'll just be rejected. >>>> >>>> Any other examples of this I should fix too? >>> >>> >>> >>> Sorry, to clarify... any other items besides Scan.scanrelid that I should >>> fix? >>> >> This naming is a little bit confusing, however, I don't think it "should" >> be >> changed because this structure has been used for a long time, so reworking >> will prevent back-patching when we find bugs around "scanrelid". > > > We can still backpatch; it just requires more work. And how many bugs do we > actually expect to find around this anyway? > > If folks think this just isn't worth fixing fine, but I find the > backpatching argument rather dubious. > Even though here is no problem around Scan structure itself, a bugfix may use the variable name of "scanrelid" to fix it. If we renamed it on v9.5, we also need a little adjustment to apply this bugfix on prior versions. It seems to me a waste of time for committers. -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On 11/01/15 08:56, Kohei KaiGai wrote: > 2015-01-11 10:40 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: >>>>>> >>>>>> Yeah there are actually several places in the code where "relid" means >>>>>> index in range table and not oid of relation, it still manages to >>>>>> confuse >>>>>> me. Nothing this patch can do about that. >>>>> >>>>> Well, since it's confused 3 of us now... should we change it (as a >>>>> separate patch)? I'm willing to do that work but don't want to waste >>>>> time if >>>>> it'll just be rejected. >>>>> >>>>> Any other examples of this I should fix too? >>>> >>>> Sorry, to clarify... any other items besides Scan.scanrelid that I should >>>> fix? >>>> >>> This naming is a little bit confusing, however, I don't think it "should" >>> be >>> changed because this structure has been used for a long time, so reworking >>> will prevent back-patching when we find bugs around "scanrelid". >> >> We can still backpatch; it just requires more work. And how many bugs do we >> actually expect to find around this anyway? >> >> If folks think this just isn't worth fixing fine, but I find the >> backpatching argument rather dubious. >> > Even though here is no problem around Scan structure itself, a bugfix may > use the variable name of "scanrelid" to fix it. If we renamed it on v9.5, we > also need a little adjustment to apply this bugfix on prior versions. > It seems to me a waste of time for committers. > I tend to agree, especially as there is multiple places in code this would affect - RelOptInfo and RestrictInfo have same issue, etc. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Petr Jelinek <petr@2ndquadrant.com> writes: > On 11/01/15 08:56, Kohei KaiGai wrote: >> 2015-01-11 10:40 GMT+09:00 Jim Nasby <Jim.Nasby@bluetreble.com>: >>>> Yeah there are actually several places in the code where "relid" means >>>> index in range table and not oid of relation, it still manages to >>>> confuse me. Nothing this patch can do about that. >>> Well, since it's confused 3 of us now... should we change it (as a >>> separate patch)? I'm willing to do that work but don't want to waste >>> time if it'll just be rejected. >> It seems to me a waste of time for committers. > I tend to agree, especially as there is multiple places in code this > would affect - RelOptInfo and RestrictInfo have same issue, etc. Generally speaking, if you're not sure whether a "relid" variable in the planner is meant to be a table OID or a rangetable index, you can tell by noting whether it's declared as type Oid or type int. So I'm also -1 on any wholesale renaming, especially given the complete lack of an obviously superior naming convention to change to. If there are any places where such variables are improperly declared, then of course we ought to fix that. regards, tom lane
On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > When custom-scan node replaced a join-plan, it shall have at least two > child plan-nodes. The callback handler of PlanCustomPath needs to be > able to call create_plan_recurse() to transform the underlying path-nodes > to plan-nodes, because this custom-scan node may take other built-in > scan or sub-join nodes as its inner/outer input. > In case of FDW, it shall kick any underlying scan relations to remote > side, thus we may not expect ForeignScan has underlying plans... Do you have an example of this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > When custom-scan node replaced a join-plan, it shall have at least two > > child plan-nodes. The callback handler of PlanCustomPath needs to be > > able to call create_plan_recurse() to transform the underlying > > path-nodes to plan-nodes, because this custom-scan node may take other > > built-in scan or sub-join nodes as its inner/outer input. > > In case of FDW, it shall kick any underlying scan relations to remote > > side, thus we may not expect ForeignScan has underlying plans... > > Do you have an example of this? > Yes, even though full code set is too large for patch submission... https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. It takes GpuHashJoinPath inherited from CustomPath that has multiple underlying scan/join paths. Once it is called back from the backend, it also calls create_plan_recurse() to make inner/outer plan nodes according to the paths. In the result, we can see the following query execution plan that CustomScan takes underlying scan plans. postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; QUERY PLAN ----------------------------------------------------------------------------------Custom Scan (GpuHashJoin) (cost=2968.00..140120.31rows=3970922 width=143) Hash clause 1: (aid = aid) Hash clause 2: (bid = bid) Bulkload: On -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 width=77) -> Custom Scan (MultiHash) (cost=734.00..734.00rows=40000 width=37) hash keys: aid nBatches: 1 Buckets: 46000 Memory Usage: 99.99% -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) -> Custom Scan (MultiHash) (cost=734.00..734.00rows=40000 width=37) hash keys: bid nBatches: 1 Buckets: 46000 Memory Usage:49.99% -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 width=37) (13 rows) Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Robert Haas [mailto:robertmhaas@gmail.com] > Sent: Thursday, January 15, 2015 2:07 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] > Custom Plan API) > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > When custom-scan node replaced a join-plan, it shall have at least two > > child plan-nodes. The callback handler of PlanCustomPath needs to be > > able to call create_plan_recurse() to transform the underlying > > path-nodes to plan-nodes, because this custom-scan node may take other > > built-in scan or sub-join nodes as its inner/outer input. > > In case of FDW, it shall kick any underlying scan relations to remote > > side, thus we may not expect ForeignScan has underlying plans... > > Do you have an example of this? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > Company
<div dir="ltr"><br /><div class="gmail_extra"><br /><div class="gmail_quote">On Thu, Jan 15, 2015 at 8:02 AM, Kouhei Kaigai<span dir="ltr"><<a href="mailto:kaigai@ak.jp.nec.com" target="_blank">kaigai@ak.jp.nec.com</a>></span> wrote:<br/><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">>On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <<a href="mailto:kaigai@ak.jp.nec.com">kaigai@ak.jp.nec.com</a>>wrote:<br /> > > When custom-scan node replaced a join-plan,it shall have at least two<br /> > > child plan-nodes. The callback handler of PlanCustomPath needs to be<br/> > > able to call create_plan_recurse() to transform the underlying<br /> > > path-nodes to plan-nodes,because this custom-scan node may take other<br /> > > built-in scan or sub-join nodes as its inner/outerinput.<br /> > > In case of FDW, it shall kick any underlying scan relations to remote<br /> > > side,thus we may not expect ForeignScan has underlying plans...<br /> ><br /> > Do you have an example of this?<br/> ><br /></span>Yes, even though full code set is too large for patch submission...<br /><br /><a href="https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880" target="_blank">https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880</a><br/><br /> This create_gpuhashjoin_plan()is PlanCustomPath callback of GpuHashJoin.<br /> It takes GpuHashJoinPath inherited from CustomPaththat has multiple<br /> underlying scan/join paths.<br /> Once it is called back from the backend, it also callscreate_plan_recurse()<br /> to make inner/outer plan nodes according to the paths.<br /><br /> In the result, we cansee the following query execution plan that CustomScan<br /> takes underlying scan plans.<br /><br /> postgres=# EXPLAINSELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2;<br /> QUERY PLAN<br /> ----------------------------------------------------------------------------------<br/> Custom Scan (GpuHashJoin) (cost=2968.00..140120.31rows=3970922 width=143)<br /> Hash clause 1: (aid = aid)<br /> Hash clause 2: (bid = bid)<br/> Bulkload: On<br /> -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 width=77)<br /> -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 width=37)<br /> hash keys: aid<br /> nBatches: 1 Buckets: 46000 Memory Usage: 99.99%<br /> -> Seq Scan on t1 (cost=0.00..734.00 rows=40000width=37)<br /> -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 width=37)<br /> hash keys: bid<br /> nBatches: 1 Buckets: 46000 Memory Usage: 49.99%<br /> -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 width=37)<br /> (13 rows)<br /></blockquote></div><br /></div><div class="gmail_extra">Whereare we on this? AFAIK, we have now a feature with no documentation and no example in-core to testthose custom routine APIs, hence moved to next CF.<br />-- <br /><div class="gmail_signature">Michael<br /></div></div></div>
> Where are we on this? AFAIK, we have now a feature with no documentation > and no example in-core to test those custom routine APIs, hence moved to > next CF. > Now Hanada-san is working on the example module that use this new infrastructure on top of postgres_fdw. Probably, he will submit the patch within a couple of days, for the upcoming commit fest. Regarding to the documentation, a consensus was to make up a wikipage to edit the description by everyone, then it shall become source of SGML file. The latest one is here: https://wiki.postgresql.org/wiki/CustomScanInterface Anyway, the next commit-fest shall within a couple of days. I'd like to have discussion for the feature. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Michael Paquier [mailto:michael.paquier@gmail.com] > Sent: Friday, February 13, 2015 4:38 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] > Custom Plan API) > > > > On Thu, Jan 15, 2015 at 8:02 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai > <kaigai@ak.jp.nec.com> wrote: > > > When custom-scan node replaced a join-plan, it shall have at > least two > > > child plan-nodes. The callback handler of PlanCustomPath needs > to be > > > able to call create_plan_recurse() to transform the underlying > > > path-nodes to plan-nodes, because this custom-scan node may > take other > > > built-in scan or sub-join nodes as its inner/outer input. > > > In case of FDW, it shall kick any underlying scan relations > to remote > > > side, thus we may not expect ForeignScan has underlying plans... > > > > Do you have an example of this? > > > Yes, even though full code set is too large for patch submission... > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin. > c#L1880 > > This create_gpuhashjoin_plan() is PlanCustomPath callback of > GpuHashJoin. > It takes GpuHashJoinPath inherited from CustomPath that has > multiple > underlying scan/join paths. > Once it is called back from the backend, it also calls > create_plan_recurse() > to make inner/outer plan nodes according to the paths. > > In the result, we can see the following query execution plan that > CustomScan > takes underlying scan plans. > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN > t2; > QUERY PLAN > -------------------------------------------------------------- > -------------------- > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 > rows=3970922 width=143) > Hash clause 1: (aid = aid) > Hash clause 2: (bid = bid) > Bulkload: On > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 > rows=4000009 width=77) > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > width=37) > hash keys: aid > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 > width=37) > -> Custom Scan (MultiHash) (cost=734.00..734.00 > rows=40000 width=37) > hash keys: bid > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 > width=37) > (13 rows) > > > > Where are we on this? AFAIK, we have now a feature with no documentation > and no example in-core to test those custom routine APIs, hence moved to > next CF. > -- > > Michael
On Fri, Feb 13, 2015 at 4:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
-- > Where are we on this? AFAIK, we have now a feature with no documentation
> and no example in-core to test those custom routine APIs, hence moved to
> next CF.
>
Now Hanada-san is working on the example module that use this new
infrastructure on top of postgres_fdw. Probably, he will submit the
patch within a couple of days, for the upcoming commit fest.
I am a bit surprised by that. Are you planning to give up on the ctidscan module module and
Regarding to the documentation, a consensus was to make up a wikipage
to edit the description by everyone, then it shall become source of
SGML file.
The latest one is here:
https://wiki.postgresql.org/wiki/CustomScanInterface
OK. This looks like a good base. It would be good to have an actual patch for review as well at this stage.
Michael
On Fri, Feb 13, 2015 at 6:12 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
-- On Fri, Feb 13, 2015 at 4:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:> Where are we on this? AFAIK, we have now a feature with no documentation
> and no example in-core to test those custom routine APIs, hence moved to
> next CF.
>
Now Hanada-san is working on the example module that use this new
infrastructure on top of postgres_fdw. Probably, he will submit the
patch within a couple of days, for the upcoming commit fest.I am a bit surprised by that. Are you planning to give up on the ctidscan module module and
Sorry I typed the wrong key.
So... Are you planning to give up on the ctidscan module and submit only the module written by Hanada-san on top of postgres_fdw? As I imagine that the goal is just to have a test module to run the APIs why would the module submitted by Hanada-san be that necessary?
Michael
> Sorry I typed the wrong key. > > So... Are you planning to give up on the ctidscan module and submit only > the module written by Hanada-san on top of postgres_fdw? As I imagine that > the goal is just to have a test module to run the APIs why would the module > submitted by Hanada-san be that necessary? > No. The ctidscan module is a reference implementation towards the existing custom-scan interface that just supports relation scan with own way, but no support for relations join at this moment. The upcoming enhancement to postgres_fdw will support remote join, that looks like a scan on pseudo materialized relation on local side. It is the proof of the concept to the new interface I like to discuss in this thread. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Michael Paquier [mailto:michael.paquier@gmail.com] > Sent: Friday, February 13, 2015 6:17 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > API) > > > > On Fri, Feb 13, 2015 at 6:12 PM, Michael Paquier <michael.paquier@gmail.com> > wrote: > > > > > On Fri, Feb 13, 2015 at 4:59 PM, Kouhei Kaigai > <kaigai@ak.jp.nec.com> wrote: > > > > Where are we on this? AFAIK, we have now a feature with > no documentation > > and no example in-core to test those custom routine APIs, > hence moved to > > next CF. > > > Now Hanada-san is working on the example module that use > this new > infrastructure on top of postgres_fdw. Probably, he will > submit the > patch within a couple of days, for the upcoming commit fest. > > > > I am a bit surprised by that. Are you planning to give up on the > ctidscan module module and > > > > Sorry I typed the wrong key. > > So... Are you planning to give up on the ctidscan module and submit only > the module written by Hanada-san on top of postgres_fdw? As I imagine that > the goal is just to have a test module to run the APIs why would the module > submitted by Hanada-san be that necessary? > > -- > > Michael
The attached patch is a rebased version of join replacement with foreign-/custom-scan. Here is no feature updates at this moment but SGML documentation is added (according to Michael's comment). This infrastructure allows foreign-data-wrapper and custom-scan- provider to add alternative scan paths towards relations join. From viewpoint of the executor, it looks like a scan on a pseudo- relation that is materialized from multiple relations, even though FDW/CSP internally processes relations join with their own logic. Its basic idea is, (1) scanrelid==0 indicates this foreign/custom scan node runs on a pseudo relation and (2) fdw_ps_tlist and custom_ps_tlist introduce the definition of the pseudo relation, because it is not associated with a tangible relation unlike simple scan case, thus planner cannot know the expected record type to be returned without these additional information. These two enhancement enables extensions to process relations join internally, and to perform as like existing scan node from viewpoint of the core backend. Also, as an aside. I had a discussion with Hanada-san about this interface off-list. He had an idea to keep create_plan_recurse() static, using a special list field in CustomPath structure to chain underlying Path node. If core backend translate the Path node to Plan node if valid list given, extension does not need to call create_plan_recurse() by itself. I have no preference about this. Does anybody have opinion? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Thursday, January 15, 2015 8:03 AM > To: Robert Haas > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > API) > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > wrote: > > > When custom-scan node replaced a join-plan, it shall have at least > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > > to be able to call create_plan_recurse() to transform the underlying > > > path-nodes to plan-nodes, because this custom-scan node may take > > > other built-in scan or sub-join nodes as its inner/outer input. > > > In case of FDW, it shall kick any underlying scan relations to > > > remote side, thus we may not expect ForeignScan has underlying plans... > > > > Do you have an example of this? > > > Yes, even though full code set is too large for patch submission... > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 > > This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. > It takes GpuHashJoinPath inherited from CustomPath that has multiple > underlying scan/join paths. > Once it is called back from the backend, it also calls create_plan_recurse() > to make inner/outer plan nodes according to the paths. > > In the result, we can see the following query execution plan that CustomScan > takes underlying scan plans. > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; > QUERY PLAN > ---------------------------------------------------------------------- > ------------ > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 > width=143) > Hash clause 1: (aid = aid) > Hash clause 2: (bid = bid) > Bulkload: On > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 > width=77) > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > width=37) > hash keys: aid > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > width=37) > hash keys: bid > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 > width=37) > (13 rows) > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: Robert Haas [mailto:robertmhaas@gmail.com] > > Sent: Thursday, January 15, 2015 2:07 AM > > To: Kaigai Kouhei(海外 浩平) > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] > > [v9.5] Custom Plan API) > > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > wrote: > > > When custom-scan node replaced a join-plan, it shall have at least > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > > to be able to call create_plan_recurse() to transform the underlying > > > path-nodes to plan-nodes, because this custom-scan node may take > > > other built-in scan or sub-join nodes as its inner/outer input. > > > In case of FDW, it shall kick any underlying scan relations to > > > remote side, thus we may not expect ForeignScan has underlying plans... > > > > Do you have an example of this? > > > > -- > > Robert Haas > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > > Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
The attached version of custom/foreign-join interface patch fixes up the problem reported on the join-pushdown support thread. The previous version referenced *_ps_tlist on setrefs.c, to check whether the Custom/ForeignScan node is associated with a particular base relation, or not. This logic considered above nodes performs base relation scan, if *_ps_tlist is valid. However, it was incorrect in case when underlying pseudo-scan relation has empty targetlist. Instead of the previous logic, it shall be revised to check scanrelid itself. If zero, it means Custom/ForeignScan node is not associated with a particular base relation, thus, its slot descriptor for scan shall be constructed based on *_ps_tlist. Also, I noticed a potential problem if CSP/FDW driver want to displays expression nodes using deparse_expression() but varnode within this expression does not appear in the *_ps_tlist. For example, a remote query below shall return rows with two columns. SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; Thus, ForeignScan will perform like as a scan on relation with two columns, and FDW driver will set two TargetEntry on the fdw_ps_tlist. If FDW is designed to keep the join condition (aid = bid) using expression node form, it is expected to be saved on custom/fdw_expr variable, then setrefs.c rewrites the varnode according to *_ps_tlist. It means, we also have to add *_ps_tlist both of "aid" and "bid" to avoid failure on variable lookup. However, these additional entries changes the definition of the slot descriptor. So, I adjusted ExecInitForeignScan and ExecInitCustomScan to use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct the slot descriptor based on the *_ps_tlist. It expects CSP/FDW drivers to add target-entries with resjunk=true, if it wants to have additional entries for variable lookups on EXPLAIN command. Fortunately or unfortunately, postgres_fdw keeps its remote query in cstring form, so it does not need to add junk entries on the fdw_ps_tlist. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Sunday, February 15, 2015 11:01 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > The attached patch is a rebased version of join replacement with > foreign-/custom-scan. Here is no feature updates at this moment > but SGML documentation is added (according to Michael's comment). > > This infrastructure allows foreign-data-wrapper and custom-scan- > provider to add alternative scan paths towards relations join. > From viewpoint of the executor, it looks like a scan on a pseudo- > relation that is materialized from multiple relations, even though > FDW/CSP internally processes relations join with their own logic. > > Its basic idea is, (1) scanrelid==0 indicates this foreign/custom > scan node runs on a pseudo relation and (2) fdw_ps_tlist and > custom_ps_tlist introduce the definition of the pseudo relation, > because it is not associated with a tangible relation unlike > simple scan case, thus planner cannot know the expected record > type to be returned without these additional information. > These two enhancement enables extensions to process relations > join internally, and to perform as like existing scan node from > viewpoint of the core backend. > > Also, as an aside. I had a discussion with Hanada-san about this > interface off-list. He had an idea to keep create_plan_recurse() > static, using a special list field in CustomPath structure to > chain underlying Path node. If core backend translate the Path > node to Plan node if valid list given, extension does not need to > call create_plan_recurse() by itself. > I have no preference about this. Does anybody have opinion? > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > Sent: Thursday, January 15, 2015 8:03 AM > > To: Robert Haas > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > > API) > > > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > > wrote: > > > > When custom-scan node replaced a join-plan, it shall have at least > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > > > to be able to call create_plan_recurse() to transform the underlying > > > > path-nodes to plan-nodes, because this custom-scan node may take > > > > other built-in scan or sub-join nodes as its inner/outer input. > > > > In case of FDW, it shall kick any underlying scan relations to > > > > remote side, thus we may not expect ForeignScan has underlying plans... > > > > > > Do you have an example of this? > > > > > Yes, even though full code set is too large for patch submission... > > > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 > > > > This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. > > It takes GpuHashJoinPath inherited from CustomPath that has multiple > > underlying scan/join paths. > > Once it is called back from the backend, it also calls create_plan_recurse() > > to make inner/outer plan nodes according to the paths. > > > > In the result, we can see the following query execution plan that CustomScan > > takes underlying scan plans. > > > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; > > QUERY PLAN > > ---------------------------------------------------------------------- > > ------------ > > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 > > width=143) > > Hash clause 1: (aid = aid) > > Hash clause 2: (bid = bid) > > Bulkload: On > > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 > > width=77) > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > > width=37) > > hash keys: aid > > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% > > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > > width=37) > > hash keys: bid > > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% > > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 > > width=37) > > (13 rows) > > > > Thanks, > > -- > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > <kaigai@ak.jp.nec.com> > > > > > > > -----Original Message----- > > > From: Robert Haas [mailto:robertmhaas@gmail.com] > > > Sent: Thursday, January 15, 2015 2:07 AM > > > To: Kaigai Kouhei(海外 浩平) > > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] > > > [v9.5] Custom Plan API) > > > > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > > wrote: > > > > When custom-scan node replaced a join-plan, it shall have at least > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > > > to be able to call create_plan_recurse() to transform the underlying > > > > path-nodes to plan-nodes, because this custom-scan node may take > > > > other built-in scan or sub-join nodes as its inner/outer input. > > > > In case of FDW, it shall kick any underlying scan relations to > > > > remote side, thus we may not expect ForeignScan has underlying plans... > > > > > > Do you have an example of this? > > > > > > -- > > > Robert Haas > > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > > > Company > > > > -- > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > > changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Sorry, I misoperated on patch creation. Attached one is the correct version. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Tuesday, March 03, 2015 6:31 PM > To: Kaigai Kouhei(海外 浩平); Robert Haas > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > The attached version of custom/foreign-join interface patch > fixes up the problem reported on the join-pushdown support > thread. > > The previous version referenced *_ps_tlist on setrefs.c, to > check whether the Custom/ForeignScan node is associated with > a particular base relation, or not. > This logic considered above nodes performs base relation scan, > if *_ps_tlist is valid. However, it was incorrect in case when > underlying pseudo-scan relation has empty targetlist. > Instead of the previous logic, it shall be revised to check > scanrelid itself. If zero, it means Custom/ForeignScan node is > not associated with a particular base relation, thus, its slot > descriptor for scan shall be constructed based on *_ps_tlist. > > > Also, I noticed a potential problem if CSP/FDW driver want to > displays expression nodes using deparse_expression() but > varnode within this expression does not appear in the *_ps_tlist. > For example, a remote query below shall return rows with two > columns. > > SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; > > Thus, ForeignScan will perform like as a scan on relation with > two columns, and FDW driver will set two TargetEntry on the > fdw_ps_tlist. If FDW is designed to keep the join condition > (aid = bid) using expression node form, it is expected to be > saved on custom/fdw_expr variable, then setrefs.c rewrites the > varnode according to *_ps_tlist. > It means, we also have to add *_ps_tlist both of "aid" and "bid" > to avoid failure on variable lookup. However, these additional > entries changes the definition of the slot descriptor. > So, I adjusted ExecInitForeignScan and ExecInitCustomScan to > use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct > the slot descriptor based on the *_ps_tlist. > It expects CSP/FDW drivers to add target-entries with resjunk=true, > if it wants to have additional entries for variable lookups on > EXPLAIN command. > > Fortunately or unfortunately, postgres_fdw keeps its remote query > in cstring form, so it does not need to add junk entries on the > fdw_ps_tlist. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > Sent: Sunday, February 15, 2015 11:01 PM > > To: Kaigai Kouhei(海外 浩平); Robert Haas > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > > The attached patch is a rebased version of join replacement with > > foreign-/custom-scan. Here is no feature updates at this moment > > but SGML documentation is added (according to Michael's comment). > > > > This infrastructure allows foreign-data-wrapper and custom-scan- > > provider to add alternative scan paths towards relations join. > > From viewpoint of the executor, it looks like a scan on a pseudo- > > relation that is materialized from multiple relations, even though > > FDW/CSP internally processes relations join with their own logic. > > > > Its basic idea is, (1) scanrelid==0 indicates this foreign/custom > > scan node runs on a pseudo relation and (2) fdw_ps_tlist and > > custom_ps_tlist introduce the definition of the pseudo relation, > > because it is not associated with a tangible relation unlike > > simple scan case, thus planner cannot know the expected record > > type to be returned without these additional information. > > These two enhancement enables extensions to process relations > > join internally, and to perform as like existing scan node from > > viewpoint of the core backend. > > > > Also, as an aside. I had a discussion with Hanada-san about this > > interface off-list. He had an idea to keep create_plan_recurse() > > static, using a special list field in CustomPath structure to > > chain underlying Path node. If core backend translate the Path > > node to Plan node if valid list given, extension does not need to > > call create_plan_recurse() by itself. > > I have no preference about this. Does anybody have opinion? > > > > Thanks, > > -- > > NEC OSS Promotion Center / PG-Strom Project > > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > > > > -----Original Message----- > > > From: pgsql-hackers-owner@postgresql.org > > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > > Sent: Thursday, January 15, 2015 8:03 AM > > > To: Robert Haas > > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > > > API) > > > > > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > > > wrote: > > > > > When custom-scan node replaced a join-plan, it shall have at least > > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > > > > to be able to call create_plan_recurse() to transform the underlying > > > > > path-nodes to plan-nodes, because this custom-scan node may take > > > > > other built-in scan or sub-join nodes as its inner/outer input. > > > > > In case of FDW, it shall kick any underlying scan relations to > > > > > remote side, thus we may not expect ForeignScan has underlying plans... > > > > > > > > Do you have an example of this? > > > > > > > Yes, even though full code set is too large for patch submission... > > > > > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 > > > > > > This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. > > > It takes GpuHashJoinPath inherited from CustomPath that has multiple > > > underlying scan/join paths. > > > Once it is called back from the backend, it also calls create_plan_recurse() > > > to make inner/outer plan nodes according to the paths. > > > > > > In the result, we can see the following query execution plan that CustomScan > > > takes underlying scan plans. > > > > > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; > > > QUERY PLAN > > > ---------------------------------------------------------------------- > > > ------------ > > > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 > > > width=143) > > > Hash clause 1: (aid = aid) > > > Hash clause 2: (bid = bid) > > > Bulkload: On > > > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 > > > width=77) > > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > > > width=37) > > > hash keys: aid > > > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% > > > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) > > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > > > width=37) > > > hash keys: bid > > > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% > > > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 > > > width=37) > > > (13 rows) > > > > > > Thanks, > > > -- > > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > > <kaigai@ak.jp.nec.com> > > > > > > > > > > -----Original Message----- > > > > From: Robert Haas [mailto:robertmhaas@gmail.com] > > > > Sent: Thursday, January 15, 2015 2:07 AM > > > > To: Kaigai Kouhei(海外 浩平) > > > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] > > > > [v9.5] Custom Plan API) > > > > > > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > > > wrote: > > > > > When custom-scan node replaced a join-plan, it shall have at least > > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > > > > to be able to call create_plan_recurse() to transform the underlying > > > > > path-nodes to plan-nodes, because this custom-scan node may take > > > > > other built-in scan or sub-join nodes as its inner/outer input. > > > > > In case of FDW, it shall kick any underlying scan relations to > > > > > remote side, thus we may not expect ForeignScan has underlying plans... > > > > > > > > Do you have an example of this? > > > > > > > > -- > > > > Robert Haas > > > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > > > > Company > > > > > > -- > > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > > > changes to your subscription: > > > http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Kaigai-san, The v6 patch was cleanly applied on master branch. I'll rebase my patch onto it, but before that I have a comment about name of the new FDW API handler GetForeignJoinPath. Obviously FDW can add multiple paths at a time, like GetForeignPaths, so IMO it should be renamed to GetForeignJoinPaths, with plural form. In addition to that, new member of RelOptInfo, fdw_handler, should be initialized explicitly in build_simple_rel. Please see attached a patch for these changes. I'll review the v6 path afterwards. 2015-03-03 20:20 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Sorry, I misoperated on patch creation. > Attached one is the correct version. > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > >> -----Original Message----- >> From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai >> Sent: Tuesday, March 03, 2015 6:31 PM >> To: Kaigai Kouhei(海外 浩平); Robert Haas >> Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada >> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) >> >> The attached version of custom/foreign-join interface patch >> fixes up the problem reported on the join-pushdown support >> thread. >> >> The previous version referenced *_ps_tlist on setrefs.c, to >> check whether the Custom/ForeignScan node is associated with >> a particular base relation, or not. >> This logic considered above nodes performs base relation scan, >> if *_ps_tlist is valid. However, it was incorrect in case when >> underlying pseudo-scan relation has empty targetlist. >> Instead of the previous logic, it shall be revised to check >> scanrelid itself. If zero, it means Custom/ForeignScan node is >> not associated with a particular base relation, thus, its slot >> descriptor for scan shall be constructed based on *_ps_tlist. >> >> >> Also, I noticed a potential problem if CSP/FDW driver want to >> displays expression nodes using deparse_expression() but >> varnode within this expression does not appear in the *_ps_tlist. >> For example, a remote query below shall return rows with two >> columns. >> >> SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; >> >> Thus, ForeignScan will perform like as a scan on relation with >> two columns, and FDW driver will set two TargetEntry on the >> fdw_ps_tlist. If FDW is designed to keep the join condition >> (aid = bid) using expression node form, it is expected to be >> saved on custom/fdw_expr variable, then setrefs.c rewrites the >> varnode according to *_ps_tlist. >> It means, we also have to add *_ps_tlist both of "aid" and "bid" >> to avoid failure on variable lookup. However, these additional >> entries changes the definition of the slot descriptor. >> So, I adjusted ExecInitForeignScan and ExecInitCustomScan to >> use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct >> the slot descriptor based on the *_ps_tlist. >> It expects CSP/FDW drivers to add target-entries with resjunk=true, >> if it wants to have additional entries for variable lookups on >> EXPLAIN command. >> >> Fortunately or unfortunately, postgres_fdw keeps its remote query >> in cstring form, so it does not need to add junk entries on the >> fdw_ps_tlist. >> >> Thanks, >> -- >> NEC OSS Promotion Center / PG-Strom Project >> KaiGai Kohei <kaigai@ak.jp.nec.com> >> >> >> > -----Original Message----- >> > From: pgsql-hackers-owner@postgresql.org >> > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai >> > Sent: Sunday, February 15, 2015 11:01 PM >> > To: Kaigai Kouhei(海外 浩平); Robert Haas >> > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada >> > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) >> > >> > The attached patch is a rebased version of join replacement with >> > foreign-/custom-scan. Here is no feature updates at this moment >> > but SGML documentation is added (according to Michael's comment). >> > >> > This infrastructure allows foreign-data-wrapper and custom-scan- >> > provider to add alternative scan paths towards relations join. >> > From viewpoint of the executor, it looks like a scan on a pseudo- >> > relation that is materialized from multiple relations, even though >> > FDW/CSP internally processes relations join with their own logic. >> > >> > Its basic idea is, (1) scanrelid==0 indicates this foreign/custom >> > scan node runs on a pseudo relation and (2) fdw_ps_tlist and >> > custom_ps_tlist introduce the definition of the pseudo relation, >> > because it is not associated with a tangible relation unlike >> > simple scan case, thus planner cannot know the expected record >> > type to be returned without these additional information. >> > These two enhancement enables extensions to process relations >> > join internally, and to perform as like existing scan node from >> > viewpoint of the core backend. >> > >> > Also, as an aside. I had a discussion with Hanada-san about this >> > interface off-list. He had an idea to keep create_plan_recurse() >> > static, using a special list field in CustomPath structure to >> > chain underlying Path node. If core backend translate the Path >> > node to Plan node if valid list given, extension does not need to >> > call create_plan_recurse() by itself. >> > I have no preference about this. Does anybody have opinion? >> > >> > Thanks, >> > -- >> > NEC OSS Promotion Center / PG-Strom Project >> > KaiGai Kohei <kaigai@ak.jp.nec.com> >> > >> > >> > > -----Original Message----- >> > > From: pgsql-hackers-owner@postgresql.org >> > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai >> > > Sent: Thursday, January 15, 2015 8:03 AM >> > > To: Robert Haas >> > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada >> > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan >> > > API) >> > > >> > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> >> > > wrote: >> > > > > When custom-scan node replaced a join-plan, it shall have at least >> > > > > two child plan-nodes. The callback handler of PlanCustomPath needs >> > > > > to be able to call create_plan_recurse() to transform the underlying >> > > > > path-nodes to plan-nodes, because this custom-scan node may take >> > > > > other built-in scan or sub-join nodes as its inner/outer input. >> > > > > In case of FDW, it shall kick any underlying scan relations to >> > > > > remote side, thus we may not expect ForeignScan has underlying plans... >> > > > >> > > > Do you have an example of this? >> > > > >> > > Yes, even though full code set is too large for patch submission... >> > > >> > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 >> > > >> > > This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. >> > > It takes GpuHashJoinPath inherited from CustomPath that has multiple >> > > underlying scan/join paths. >> > > Once it is called back from the backend, it also calls create_plan_recurse() >> > > to make inner/outer plan nodes according to the paths. >> > > >> > > In the result, we can see the following query execution plan that CustomScan >> > > takes underlying scan plans. >> > > >> > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; >> > > QUERY PLAN >> > > ---------------------------------------------------------------------- >> > > ------------ >> > > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 >> > > width=143) >> > > Hash clause 1: (aid = aid) >> > > Hash clause 2: (bid = bid) >> > > Bulkload: On >> > > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 >> > > width=77) >> > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 >> > > width=37) >> > > hash keys: aid >> > > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% >> > > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) >> > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 >> > > width=37) >> > > hash keys: bid >> > > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% >> > > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 >> > > width=37) >> > > (13 rows) >> > > >> > > Thanks, >> > > -- >> > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei >> > > <kaigai@ak.jp.nec.com> >> > > >> > > >> > > > -----Original Message----- >> > > > From: Robert Haas [mailto:robertmhaas@gmail.com] >> > > > Sent: Thursday, January 15, 2015 2:07 AM >> > > > To: Kaigai Kouhei(海外 浩平) >> > > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada >> > > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] >> > > > [v9.5] Custom Plan API) >> > > > >> > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> >> > > wrote: >> > > > > When custom-scan node replaced a join-plan, it shall have at least >> > > > > two child plan-nodes. The callback handler of PlanCustomPath needs >> > > > > to be able to call create_plan_recurse() to transform the underlying >> > > > > path-nodes to plan-nodes, because this custom-scan node may take >> > > > > other built-in scan or sub-join nodes as its inner/outer input. >> > > > > In case of FDW, it shall kick any underlying scan relations to >> > > > > remote side, thus we may not expect ForeignScan has underlying plans... >> > > > >> > > > Do you have an example of this? >> > > > >> > > > -- >> > > > Robert Haas >> > > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL >> > > > Company >> > > >> > > -- >> > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make >> > > changes to your subscription: >> > > http://www.postgresql.org/mailpref/pgsql-hackers -- Shigeru HANADA
Attachment
> Obviously FDW can add multiple paths at a time, like GetForeignPaths, > so IMO it should be renamed to GetForeignJoinPaths, with plural form. > > In addition to that, new member of RelOptInfo, fdw_handler, should be > initialized explicitly in build_simple_rel. > > Please see attached a patch for these changes. > Thanks for your checks. Yep, the name of FDW handler should be ...Paths(), instead of Path(). The attached one integrates Hanada-san's updates. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com] > Sent: Tuesday, March 03, 2015 9:26 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom > Plan API) > > Kaigai-san, > > The v6 patch was cleanly applied on master branch. I'll rebase my > patch onto it, but before that I have a comment about name of the new > FDW API handler GetForeignJoinPath. > > Obviously FDW can add multiple paths at a time, like GetForeignPaths, > so IMO it should be renamed to GetForeignJoinPaths, with plural form. > > In addition to that, new member of RelOptInfo, fdw_handler, should be > initialized explicitly in build_simple_rel. > > Please see attached a patch for these changes. > > I'll review the v6 path afterwards. > > > 2015-03-03 20:20 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > Sorry, I misoperated on patch creation. > > Attached one is the correct version. > > -- > > NEC OSS Promotion Center / PG-Strom Project > > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > > >> -----Original Message----- > >> From: pgsql-hackers-owner@postgresql.org > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > >> Sent: Tuesday, March 03, 2015 6:31 PM > >> To: Kaigai Kouhei(海外 浩平); Robert Haas > >> Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > >> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > >> > >> The attached version of custom/foreign-join interface patch > >> fixes up the problem reported on the join-pushdown support > >> thread. > >> > >> The previous version referenced *_ps_tlist on setrefs.c, to > >> check whether the Custom/ForeignScan node is associated with > >> a particular base relation, or not. > >> This logic considered above nodes performs base relation scan, > >> if *_ps_tlist is valid. However, it was incorrect in case when > >> underlying pseudo-scan relation has empty targetlist. > >> Instead of the previous logic, it shall be revised to check > >> scanrelid itself. If zero, it means Custom/ForeignScan node is > >> not associated with a particular base relation, thus, its slot > >> descriptor for scan shall be constructed based on *_ps_tlist. > >> > >> > >> Also, I noticed a potential problem if CSP/FDW driver want to > >> displays expression nodes using deparse_expression() but > >> varnode within this expression does not appear in the *_ps_tlist. > >> For example, a remote query below shall return rows with two > >> columns. > >> > >> SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; > >> > >> Thus, ForeignScan will perform like as a scan on relation with > >> two columns, and FDW driver will set two TargetEntry on the > >> fdw_ps_tlist. If FDW is designed to keep the join condition > >> (aid = bid) using expression node form, it is expected to be > >> saved on custom/fdw_expr variable, then setrefs.c rewrites the > >> varnode according to *_ps_tlist. > >> It means, we also have to add *_ps_tlist both of "aid" and "bid" > >> to avoid failure on variable lookup. However, these additional > >> entries changes the definition of the slot descriptor. > >> So, I adjusted ExecInitForeignScan and ExecInitCustomScan to > >> use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct > >> the slot descriptor based on the *_ps_tlist. > >> It expects CSP/FDW drivers to add target-entries with resjunk=true, > >> if it wants to have additional entries for variable lookups on > >> EXPLAIN command. > >> > >> Fortunately or unfortunately, postgres_fdw keeps its remote query > >> in cstring form, so it does not need to add junk entries on the > >> fdw_ps_tlist. > >> > >> Thanks, > >> -- > >> NEC OSS Promotion Center / PG-Strom Project > >> KaiGai Kohei <kaigai@ak.jp.nec.com> > >> > >> > >> > -----Original Message----- > >> > From: pgsql-hackers-owner@postgresql.org > >> > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > >> > Sent: Sunday, February 15, 2015 11:01 PM > >> > To: Kaigai Kouhei(海外 浩平); Robert Haas > >> > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > >> > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > >> > > >> > The attached patch is a rebased version of join replacement with > >> > foreign-/custom-scan. Here is no feature updates at this moment > >> > but SGML documentation is added (according to Michael's comment). > >> > > >> > This infrastructure allows foreign-data-wrapper and custom-scan- > >> > provider to add alternative scan paths towards relations join. > >> > From viewpoint of the executor, it looks like a scan on a pseudo- > >> > relation that is materialized from multiple relations, even though > >> > FDW/CSP internally processes relations join with their own logic. > >> > > >> > Its basic idea is, (1) scanrelid==0 indicates this foreign/custom > >> > scan node runs on a pseudo relation and (2) fdw_ps_tlist and > >> > custom_ps_tlist introduce the definition of the pseudo relation, > >> > because it is not associated with a tangible relation unlike > >> > simple scan case, thus planner cannot know the expected record > >> > type to be returned without these additional information. > >> > These two enhancement enables extensions to process relations > >> > join internally, and to perform as like existing scan node from > >> > viewpoint of the core backend. > >> > > >> > Also, as an aside. I had a discussion with Hanada-san about this > >> > interface off-list. He had an idea to keep create_plan_recurse() > >> > static, using a special list field in CustomPath structure to > >> > chain underlying Path node. If core backend translate the Path > >> > node to Plan node if valid list given, extension does not need to > >> > call create_plan_recurse() by itself. > >> > I have no preference about this. Does anybody have opinion? > >> > > >> > Thanks, > >> > -- > >> > NEC OSS Promotion Center / PG-Strom Project > >> > KaiGai Kohei <kaigai@ak.jp.nec.com> > >> > > >> > > >> > > -----Original Message----- > >> > > From: pgsql-hackers-owner@postgresql.org > >> > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > >> > > Sent: Thursday, January 15, 2015 8:03 AM > >> > > To: Robert Haas > >> > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > >> > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > >> > > API) > >> > > > >> > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > >> > > wrote: > >> > > > > When custom-scan node replaced a join-plan, it shall have at least > >> > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > >> > > > > to be able to call create_plan_recurse() to transform the underlying > >> > > > > path-nodes to plan-nodes, because this custom-scan node may take > >> > > > > other built-in scan or sub-join nodes as its inner/outer input. > >> > > > > In case of FDW, it shall kick any underlying scan relations to > >> > > > > remote side, thus we may not expect ForeignScan has underlying plans... > >> > > > > >> > > > Do you have an example of this? > >> > > > > >> > > Yes, even though full code set is too large for patch submission... > >> > > > >> > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 > >> > > > >> > > This create_gpuhashjoin_plan() is PlanCustomPath callback of GpuHashJoin. > >> > > It takes GpuHashJoinPath inherited from CustomPath that has multiple > >> > > underlying scan/join paths. > >> > > Once it is called back from the backend, it also calls create_plan_recurse() > >> > > to make inner/outer plan nodes according to the paths. > >> > > > >> > > In the result, we can see the following query execution plan that CustomScan > >> > > takes underlying scan plans. > >> > > > >> > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; > >> > > QUERY PLAN > >> > > ---------------------------------------------------------------------- > >> > > ------------ > >> > > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 > >> > > width=143) > >> > > Hash clause 1: (aid = aid) > >> > > Hash clause 2: (bid = bid) > >> > > Bulkload: On > >> > > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 > >> > > width=77) > >> > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > >> > > width=37) > >> > > hash keys: aid > >> > > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% > >> > > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) > >> > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > >> > > width=37) > >> > > hash keys: bid > >> > > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% > >> > > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 > >> > > width=37) > >> > > (13 rows) > >> > > > >> > > Thanks, > >> > > -- > >> > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > >> > > <kaigai@ak.jp.nec.com> > >> > > > >> > > > >> > > > -----Original Message----- > >> > > > From: Robert Haas [mailto:robertmhaas@gmail.com] > >> > > > Sent: Thursday, January 15, 2015 2:07 AM > >> > > > To: Kaigai Kouhei(海外 浩平) > >> > > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > >> > > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] > >> > > > [v9.5] Custom Plan API) > >> > > > > >> > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > >> > > wrote: > >> > > > > When custom-scan node replaced a join-plan, it shall have at least > >> > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > >> > > > > to be able to call create_plan_recurse() to transform the underlying > >> > > > > path-nodes to plan-nodes, because this custom-scan node may take > >> > > > > other built-in scan or sub-join nodes as its inner/outer input. > >> > > > > In case of FDW, it shall kick any underlying scan relations to > >> > > > > remote side, thus we may not expect ForeignScan has underlying plans... > >> > > > > >> > > > Do you have an example of this? > >> > > > > >> > > > -- > >> > > > Robert Haas > >> > > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > >> > > > Company > >> > > > >> > > -- > >> > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make > >> > > changes to your subscription: > >> > > http://www.postgresql.org/mailpref/pgsql-hackers > > > > -- > Shigeru HANADA
Attachment
The attached patch integrates a suggestion from Ashutosh Bapat. It allows to track set of relations involved in a join, but replaced by foreign-/custom-scan. It enables to make correct EXPLAIN output, if FDW/CSP driver makes human readable symbols using deparse_expression() or others. Differences from v7 is identical with what I posted on the join push-down support thread. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Wednesday, March 04, 2015 11:42 AM > To: Shigeru Hanada > Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > Obviously FDW can add multiple paths at a time, like GetForeignPaths, > > so IMO it should be renamed to GetForeignJoinPaths, with plural form. > > > > In addition to that, new member of RelOptInfo, fdw_handler, should be > > initialized explicitly in build_simple_rel. > > > > Please see attached a patch for these changes. > > > Thanks for your checks. Yep, the name of FDW handler should be ...Paths(), > instead of Path(). > > The attached one integrates Hanada-san's updates. > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com] > > Sent: Tuesday, March 03, 2015 9:26 PM > > To: Kaigai Kouhei(海外 浩平) > > Cc: Robert Haas; Tom Lane; pgsql-hackers@postgreSQL.org > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom > > Plan API) > > > > Kaigai-san, > > > > The v6 patch was cleanly applied on master branch. I'll rebase my > > patch onto it, but before that I have a comment about name of the new > > FDW API handler GetForeignJoinPath. > > > > Obviously FDW can add multiple paths at a time, like GetForeignPaths, > > so IMO it should be renamed to GetForeignJoinPaths, with plural form. > > > > In addition to that, new member of RelOptInfo, fdw_handler, should be > > initialized explicitly in build_simple_rel. > > > > Please see attached a patch for these changes. > > > > I'll review the v6 path afterwards. > > > > > > 2015-03-03 20:20 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > > Sorry, I misoperated on patch creation. > > > Attached one is the correct version. > > > -- > > > NEC OSS Promotion Center / PG-Strom Project > > > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > > > > > >> -----Original Message----- > > >> From: pgsql-hackers-owner@postgresql.org > > >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > >> Sent: Tuesday, March 03, 2015 6:31 PM > > >> To: Kaigai Kouhei(海外 浩平); Robert Haas > > >> Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > >> Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > >> > > >> The attached version of custom/foreign-join interface patch > > >> fixes up the problem reported on the join-pushdown support > > >> thread. > > >> > > >> The previous version referenced *_ps_tlist on setrefs.c, to > > >> check whether the Custom/ForeignScan node is associated with > > >> a particular base relation, or not. > > >> This logic considered above nodes performs base relation scan, > > >> if *_ps_tlist is valid. However, it was incorrect in case when > > >> underlying pseudo-scan relation has empty targetlist. > > >> Instead of the previous logic, it shall be revised to check > > >> scanrelid itself. If zero, it means Custom/ForeignScan node is > > >> not associated with a particular base relation, thus, its slot > > >> descriptor for scan shall be constructed based on *_ps_tlist. > > >> > > >> > > >> Also, I noticed a potential problem if CSP/FDW driver want to > > >> displays expression nodes using deparse_expression() but > > >> varnode within this expression does not appear in the *_ps_tlist. > > >> For example, a remote query below shall return rows with two > > >> columns. > > >> > > >> SELECT atext, btext FROM tbl_a, tbl_b WHERE aid = bid; > > >> > > >> Thus, ForeignScan will perform like as a scan on relation with > > >> two columns, and FDW driver will set two TargetEntry on the > > >> fdw_ps_tlist. If FDW is designed to keep the join condition > > >> (aid = bid) using expression node form, it is expected to be > > >> saved on custom/fdw_expr variable, then setrefs.c rewrites the > > >> varnode according to *_ps_tlist. > > >> It means, we also have to add *_ps_tlist both of "aid" and "bid" > > >> to avoid failure on variable lookup. However, these additional > > >> entries changes the definition of the slot descriptor. > > >> So, I adjusted ExecInitForeignScan and ExecInitCustomScan to > > >> use ExecCleanTypeFromTL(), not ExecTypeFromTL(), when it construct > > >> the slot descriptor based on the *_ps_tlist. > > >> It expects CSP/FDW drivers to add target-entries with resjunk=true, > > >> if it wants to have additional entries for variable lookups on > > >> EXPLAIN command. > > >> > > >> Fortunately or unfortunately, postgres_fdw keeps its remote query > > >> in cstring form, so it does not need to add junk entries on the > > >> fdw_ps_tlist. > > >> > > >> Thanks, > > >> -- > > >> NEC OSS Promotion Center / PG-Strom Project > > >> KaiGai Kohei <kaigai@ak.jp.nec.com> > > >> > > >> > > >> > -----Original Message----- > > >> > From: pgsql-hackers-owner@postgresql.org > > >> > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > >> > Sent: Sunday, February 15, 2015 11:01 PM > > >> > To: Kaigai Kouhei(海外 浩平); Robert Haas > > >> > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > >> > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > API) > > >> > > > >> > The attached patch is a rebased version of join replacement with > > >> > foreign-/custom-scan. Here is no feature updates at this moment > > >> > but SGML documentation is added (according to Michael's comment). > > >> > > > >> > This infrastructure allows foreign-data-wrapper and custom-scan- > > >> > provider to add alternative scan paths towards relations join. > > >> > From viewpoint of the executor, it looks like a scan on a pseudo- > > >> > relation that is materialized from multiple relations, even though > > >> > FDW/CSP internally processes relations join with their own logic. > > >> > > > >> > Its basic idea is, (1) scanrelid==0 indicates this foreign/custom > > >> > scan node runs on a pseudo relation and (2) fdw_ps_tlist and > > >> > custom_ps_tlist introduce the definition of the pseudo relation, > > >> > because it is not associated with a tangible relation unlike > > >> > simple scan case, thus planner cannot know the expected record > > >> > type to be returned without these additional information. > > >> > These two enhancement enables extensions to process relations > > >> > join internally, and to perform as like existing scan node from > > >> > viewpoint of the core backend. > > >> > > > >> > Also, as an aside. I had a discussion with Hanada-san about this > > >> > interface off-list. He had an idea to keep create_plan_recurse() > > >> > static, using a special list field in CustomPath structure to > > >> > chain underlying Path node. If core backend translate the Path > > >> > node to Plan node if valid list given, extension does not need to > > >> > call create_plan_recurse() by itself. > > >> > I have no preference about this. Does anybody have opinion? > > >> > > > >> > Thanks, > > >> > -- > > >> > NEC OSS Promotion Center / PG-Strom Project > > >> > KaiGai Kohei <kaigai@ak.jp.nec.com> > > >> > > > >> > > > >> > > -----Original Message----- > > >> > > From: pgsql-hackers-owner@postgresql.org > > >> > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > >> > > Sent: Thursday, January 15, 2015 8:03 AM > > >> > > To: Robert Haas > > >> > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > >> > > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan > > >> > > API) > > >> > > > > >> > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > > >> > > wrote: > > >> > > > > When custom-scan node replaced a join-plan, it shall have at least > > >> > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > >> > > > > to be able to call create_plan_recurse() to transform the underlying > > >> > > > > path-nodes to plan-nodes, because this custom-scan node may take > > >> > > > > other built-in scan or sub-join nodes as its inner/outer input. > > >> > > > > In case of FDW, it shall kick any underlying scan relations to > > >> > > > > remote side, thus we may not expect ForeignScan has underlying plans... > > >> > > > > > >> > > > Do you have an example of this? > > >> > > > > > >> > > Yes, even though full code set is too large for patch submission... > > >> > > > > >> > > https://github.com/pg-strom/devel/blob/master/src/gpuhashjoin.c#L1880 > > >> > > > > >> > > This create_gpuhashjoin_plan() is PlanCustomPath callback of > GpuHashJoin. > > >> > > It takes GpuHashJoinPath inherited from CustomPath that has multiple > > >> > > underlying scan/join paths. > > >> > > Once it is called back from the backend, it also calls > create_plan_recurse() > > >> > > to make inner/outer plan nodes according to the paths. > > >> > > > > >> > > In the result, we can see the following query execution plan that CustomScan > > >> > > takes underlying scan plans. > > >> > > > > >> > > postgres=# EXPLAIN SELECT * FROM t0 NATURAL JOIN t1 NATURAL JOIN t2; > > >> > > QUERY PLAN > > >> > > > ---------------------------------------------------------------------- > > >> > > ------------ > > >> > > Custom Scan (GpuHashJoin) (cost=2968.00..140120.31 rows=3970922 > > >> > > width=143) > > >> > > Hash clause 1: (aid = aid) > > >> > > Hash clause 2: (bid = bid) > > >> > > Bulkload: On > > >> > > -> Custom Scan (GpuScan) on t0 (cost=500.00..57643.00 rows=4000009 > > >> > > width=77) > > >> > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > > >> > > width=37) > > >> > > hash keys: aid > > >> > > nBatches: 1 Buckets: 46000 Memory Usage: 99.99% > > >> > > -> Seq Scan on t1 (cost=0.00..734.00 rows=40000 width=37) > > >> > > -> Custom Scan (MultiHash) (cost=734.00..734.00 rows=40000 > > >> > > width=37) > > >> > > hash keys: bid > > >> > > nBatches: 1 Buckets: 46000 Memory Usage: 49.99% > > >> > > -> Seq Scan on t2 (cost=0.00..734.00 rows=40000 > > >> > > width=37) > > >> > > (13 rows) > > >> > > > > >> > > Thanks, > > >> > > -- > > >> > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > >> > > <kaigai@ak.jp.nec.com> > > >> > > > > >> > > > > >> > > > -----Original Message----- > > >> > > > From: Robert Haas [mailto:robertmhaas@gmail.com] > > >> > > > Sent: Thursday, January 15, 2015 2:07 AM > > >> > > > To: Kaigai Kouhei(海外 浩平) > > >> > > > Cc: Tom Lane; pgsql-hackers@postgreSQL.org; Shigeru Hanada > > >> > > > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] > > >> > > > [v9.5] Custom Plan API) > > >> > > > > > >> > > > On Fri, Jan 9, 2015 at 10:51 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> > > >> > > wrote: > > >> > > > > When custom-scan node replaced a join-plan, it shall have at least > > >> > > > > two child plan-nodes. The callback handler of PlanCustomPath needs > > >> > > > > to be able to call create_plan_recurse() to transform the underlying > > >> > > > > path-nodes to plan-nodes, because this custom-scan node may take > > >> > > > > other built-in scan or sub-join nodes as its inner/outer input. > > >> > > > > In case of FDW, it shall kick any underlying scan relations to > > >> > > > > remote side, thus we may not expect ForeignScan has underlying plans... > > >> > > > > > >> > > > Do you have an example of this? > > >> > > > > > >> > > > -- > > >> > > > Robert Haas > > >> > > > EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL > > >> > > > Company > > >> > > > > >> > > -- > > >> > > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To > make > > >> > > changes to your subscription: > > >> > > http://www.postgresql.org/mailpref/pgsql-hackers > > > > > > > > -- > > Shigeru HANADA
Attachment
On Mon, Mar 9, 2015 at 11:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch integrates a suggestion from Ashutosh Bapat. > It allows to track set of relations involved in a join, but > replaced by foreign-/custom-scan. It enables to make correct > EXPLAIN output, if FDW/CSP driver makes human readable symbols > using deparse_expression() or others. > > Differences from v7 is identical with what I posted on the > join push-down support thread. I took a look at this patch today and noticed that it incorporates not only documentation for the new functionality it adds, but also for the custom-scan functionality whose documentation I previously excluded from commit on the grounds that it needed more work, especially to improve the English. That decision was not popular at the time, and I think we need to remedy it before going further with this. I had hoped that someone else would care about this work enough to help with the documentation, but it seems not, so today I went through the documentation in this patch, excluded all of the stuff specific to custom joins, and heavily edited the rest. The result is attached. If there are no objections, I'll commit this; then, someone can rebase this patch over these changes and we can proceed from there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > I took a look at this patch today and noticed that it incorporates not > only documentation for the new functionality it adds, but also for the > custom-scan functionality whose documentation I previously excluded > from commit on the grounds that it needed more work, especially to > improve the English. That decision was not popular at the time, and I > think we need to remedy it before going further with this. I had > hoped that someone else would care about this work enough to help with > the documentation, but it seems not, so today I went through the > documentation in this patch, excluded all of the stuff specific to > custom joins, and heavily edited the rest. The result is attached. Looks good; I noticed one trivial typo --- please s/returns/return/ under ExecCustomScan. Also, perhaps instead of just "set ps_ResultTupleSlot" that should say "fill ps_ResultTupleSlot with the next tuple in the current scan direction". regards, tom lane
On 12 March 2015 at 21:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I took a look at this patch today and noticed that it incorporates not >> only documentation for the new functionality it adds, but also for the >> custom-scan functionality whose documentation I previously excluded >> from commit on the grounds that it needed more work, especially to >> improve the English. That decision was not popular at the time, and I >> think we need to remedy it before going further with this. I had >> hoped that someone else would care about this work enough to help with >> the documentation, but it seems not, so today I went through the >> documentation in this patch, excluded all of the stuff specific to >> custom joins, and heavily edited the rest. The result is attached. > > Looks good; I noticed one trivial typo --- please s/returns/return/ under > ExecCustomScan. Also, perhaps instead of just "set ps_ResultTupleSlot" > that should say "fill ps_ResultTupleSlot with the next tuple in the > current scan direction". Also: s/initalization/initialization/ -- Thom
> On Mon, Mar 9, 2015 at 11:18 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > The attached patch integrates a suggestion from Ashutosh Bapat. > > It allows to track set of relations involved in a join, but > > replaced by foreign-/custom-scan. It enables to make correct > > EXPLAIN output, if FDW/CSP driver makes human readable symbols > > using deparse_expression() or others. > > > > Differences from v7 is identical with what I posted on the > > join push-down support thread. > > I took a look at this patch today and noticed that it incorporates not > only documentation for the new functionality it adds, but also for the > custom-scan functionality whose documentation I previously excluded > from commit on the grounds that it needed more work, especially to > improve the English. That decision was not popular at the time, and I > think we need to remedy it before going further with this. I had > hoped that someone else would care about this work enough to help with > the documentation, but it seems not, so today I went through the > documentation in this patch, excluded all of the stuff specific to > custom joins, and heavily edited the rest. The result is attached. > > If there are no objections, I'll commit this; then, someone can rebase > this patch over these changes and we can proceed from there. > Thanks for your help. I tried to check the documentation from the implementation standpoint, however, I have no objection here. Best regards, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Mar 12, 2015 at 8:09 PM, Thom Brown <thom@linux.com> wrote: > On 12 March 2015 at 21:28, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I took a look at this patch today and noticed that it incorporates not >>> only documentation for the new functionality it adds, but also for the >>> custom-scan functionality whose documentation I previously excluded >>> from commit on the grounds that it needed more work, especially to >>> improve the English. That decision was not popular at the time, and I >>> think we need to remedy it before going further with this. I had >>> hoped that someone else would care about this work enough to help with >>> the documentation, but it seems not, so today I went through the >>> documentation in this patch, excluded all of the stuff specific to >>> custom joins, and heavily edited the rest. The result is attached. >> >> Looks good; I noticed one trivial typo --- please s/returns/return/ under >> ExecCustomScan. Also, perhaps instead of just "set ps_ResultTupleSlot" >> that should say "fill ps_ResultTupleSlot with the next tuple in the >> current scan direction". > > Also: > > s/initalization/initialization/ Thanks to both of you for the review. I have committed it with those improvements. Please let me know if you spot anything else. Another bit of this that I think we could commit without fretting about it too much is the code adding set_join_pathlist_hook. This is - I think - analogous to set_rel_pathlist_hook, and like that hook, could be used for other purposes than custom plan generation - e.g. to delete paths we do not want to use. I've extracted this portion of the patch and adjusted the comments; if there are no objections, I will commit this bit also. Kaigai, note that your patch puts this hook and the call to GetForeignJoinPaths() in the wrong order. As in the baserel case, the hook should get the last word, after any FDW-specific handlers have been called, so that it can delete or modify paths as well as add them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > Another bit of this that I think we could commit without fretting > about it too much is the code adding set_join_pathlist_hook. This is > - I think - analogous to set_rel_pathlist_hook, and like that hook, > could be used for other purposes than custom plan generation - e.g. to > delete paths we do not want to use. I've extracted this portion of > the patch and adjusted the comments; if there are no objections, I > will commit this bit also. I don't object to the concept, but I think that is a pretty bad place to put the hook call: add_paths_to_joinrel is typically called multiple (perhaps *many*) times per joinrel and thus this placement would force any user of the hook to do a lot of repetitive work. I think the right placement is just before the set_cheapest call for each joinrel, just as we did with set_rel_pathlist_hook. It looks like those calls are at: allpaths.c:1649 (in standard_join_search) geqo_eval.c:270 (in merge_clump) There are a couple of other set_cheapest calls that probably don't need hooked, since they are for dummy (proven empty) rels, and it's not clear how a hook could improve on an empty plan. Also, this would leave you with a much shorter parameter list ;-) ... really no reason to pass more than root and rel. regards, tom lane
On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Another bit of this that I think we could commit without fretting >> about it too much is the code adding set_join_pathlist_hook. This is >> - I think - analogous to set_rel_pathlist_hook, and like that hook, >> could be used for other purposes than custom plan generation - e.g. to >> delete paths we do not want to use. I've extracted this portion of >> the patch and adjusted the comments; if there are no objections, I >> will commit this bit also. > > I don't object to the concept, but I think that is a pretty bad place > to put the hook call: add_paths_to_joinrel is typically called multiple > (perhaps *many*) times per joinrel and thus this placement would force > any user of the hook to do a lot of repetitive work. Interesting point. I guess the question is whether a some or all callers are going to actually *want* a separate call for each invocation of add_paths_to_joinrel(), or whether they'll be happy to operate on the otherwise-complete path list. It's true that if your goal is to delete paths, it's probably best to be called just once after the path list is complete, and there might be a use case for that, but I guess it's less useful than for baserels. For a baserel, as long as you don't nuke the sequential-scan path, there is always going to be a way to complete the plan; so this would be a fine way to implement a disable-an-index extension. But for joinrels, it's not so easy to rule out, say, a hash-join here. Neither hook placement is much good for that; the path you want to get rid of may have already dominated paths you want to keep. Suppose you want to add paths - e.g. you have an extension that goes and looks for a materialized view that matches this subtree of the query, and if it finds one, it substitutes a scan of the materialized view for a scan of the baserel. Or, as in KaiGai's case, you have an extension that can perform the whole join in GPU-land and produce the same results we would have gotten via normal execution. Either way, you want - and this is the central point of the whole patch here - to inject a scan path into a joinrel. It is not altogether obvious to me what the best placement for this is. In the materialized view case, you probably need a perfect match between the baserels in the view and the baserels in the joinrel to do anything. There's no point in re-checking that for every innerrels/outerrels combination. I don't know enough about the GPU case to reason about it intelligently; maybe KaiGai can comment. I think the foreign data wrapper join pushdown case, which also aims to substitute a scan for a join, is interesting to think about, even though it's likely to be handled by a new FDW method instead of via the hook. Where should the FDW method get called from? Currently, the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets called from add_paths_to_joinrel(). The patch at http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com uses that to implement join pushdown in postgres_fdw; if you have A JOIN B JOIN C all on server X, we'll notice that the join with A and B can be turned into a foreign scan on A JOIN B, and similarly for A-C and B-C. Then, if it turns out that the cheapest path for A-B is the foreign join, and the cheapest path for C is a foreign scan, we'll arrive at the idea of a foreign scan on A-B-C, and we'll realize the same thing in each of the other combinations as well. So, eventually the foreign join gets pushed down. But there's another possible approach: suppose that join_search_one_level, after considering left-sided and right-sided joins and after considering bushy joins, checks whether every relation it's got is from the same foreign server, and if so, asks that foreign server whether it would like to contribute any paths. Would that be better or worse? A disadvantage is that if you've got something like A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed down (say, each join clause calls a non-pushdown-safe function) you'll end up examining a pile of joinrels - at every level of the join tree - and individually rejecting each one. With the build-it-up-incrementally approach, you'll figure that all out at level 2, and then after that there's nothing to do but give up quickly. On the other hand, I'm afraid the incremental approach might miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = huge.x) ON small.y = big.y AND small.z = huge.z, where all three are foreign tables on the same server. If the output of the big/huge join is big, none of those paths are going to survive at level 2, but the overall join size might be very small, so we surely want a chance to recover at level 3. (We discussed test cases of this form quite a bit in the context of e2fa76d80ba571d4de8992de6386536867250474.) Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't object to the concept, but I think that is a pretty bad place >> to put the hook call: add_paths_to_joinrel is typically called multiple >> (perhaps *many*) times per joinrel and thus this placement would force >> any user of the hook to do a lot of repetitive work. > Interesting point. I guess the question is whether a some or all > callers are going to actually *want* a separate call for each > invocation of add_paths_to_joinrel(), or whether they'll be happy to > operate on the otherwise-complete path list. Hmm. You're right, it's certainly possible that some users would like to operate on each possible pair of input relations, rather than considering the joinrel "as a whole". Maybe we need two hooks, one like your patch and one like I suggested. > ... But for joinrels, it's not so > easy to rule out, say, a hash-join here. Neither hook placement is > much good for that; the path you want to get rid of may have already > dominated paths you want to keep. I don't particularly buy that line of argument. If a path has been deleted because it was dominated by another, and you are unhappy about that decision, then a hook of this sort is not the appropriate solution; you need to be going and fixing the cost estimates that you think are wrong. (This gets back to the point I keep making that I don't actually believe you can do anything very useful with these hooks; anything interesting is probably going to involve more fundamental changes to the planner.) > I think the foreign data wrapper join pushdown case, which also aims > to substitute a scan for a join, is interesting to think about, even > though it's likely to be handled by a new FDW method instead of via > the hook. Where should the FDW method get called from? Currently, > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets > called from add_paths_to_joinrel(). The patch at > http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com > uses that to implement join pushdown in postgres_fdw; if you have A > JOIN B JOIN C all on server X, we'll notice that the join with A and B > can be turned into a foreign scan on A JOIN B, and similarly for A-C > and B-C. Then, if it turns out that the cheapest path for A-B is the > foreign join, and the cheapest path for C is a foreign scan, we'll > arrive at the idea of a foreign scan on A-B-C, and we'll realize the > same thing in each of the other combinations as well. So, eventually > the foreign join gets pushed down. But this is in fact exactly the sort of case where you should not rediscover all that over again for each pair of input rels. "Do all these baserels belong to the same foreign server" is a question that need only be considered once per joinrel. Not that that matters for this hook, because as you say we're not doing foreign-server support through this hook, but I think it's a fine example of why you'd want a single call per joinrel. regards, tom lane
> On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> Another bit of this that I think we could commit without fretting > >> about it too much is the code adding set_join_pathlist_hook. This is > >> - I think - analogous to set_rel_pathlist_hook, and like that hook, > >> could be used for other purposes than custom plan generation - e.g. to > >> delete paths we do not want to use. I've extracted this portion of > >> the patch and adjusted the comments; if there are no objections, I > >> will commit this bit also. > > > > I don't object to the concept, but I think that is a pretty bad place > > to put the hook call: add_paths_to_joinrel is typically called multiple > > (perhaps *many*) times per joinrel and thus this placement would force > > any user of the hook to do a lot of repetitive work. > > Interesting point. I guess the question is whether a some or all > callers are going to actually *want* a separate call for each > invocation of add_paths_to_joinrel(), or whether they'll be happy to > operate on the otherwise-complete path list. It's true that if your > goal is to delete paths, it's probably best to be called just once > after the path list is complete, and there might be a use case for > that, but I guess it's less useful than for baserels. For a baserel, > as long as you don't nuke the sequential-scan path, there is always > going to be a way to complete the plan; so this would be a fine way to > implement a disable-an-index extension. But for joinrels, it's not so > easy to rule out, say, a hash-join here. Neither hook placement is > much good for that; the path you want to get rid of may have already > dominated paths you want to keep. > From the standpoint of extension development, I'm uncertain whether we can easily reproduce information needed to compute alternative paths on the hook at standard_join_search(), like a hook at add_paths_to_joinrel(). (Please correct me, if I misunderstood.) For example, it is not obvious which path is inner/outer of the joinrel on which custom-scan provider tries to add an alternative scan path. Probably, extension needs to find out the path of source relations from the join_rel_level[] array. Also, how do we pull SpecialJoinInfo? It contains needed information to identify required join-type (like JOIN_LEFT), however, extension needs to search join_info_list by relids again, if hook is located at standard_join_search(). Even if number of hook invocation is larger if it is located on add_paths_to_joinrel(), it allows to design extensions simpler, I think. > Suppose you want to add paths - e.g. you have an extension that goes > and looks for a materialized view that matches this subtree of the > query, and if it finds one, it substitutes a scan of the materialized > view for a scan of the baserel. Or, as in KaiGai's case, you have an > extension that can perform the whole join in GPU-land and produce the > same results we would have gotten via normal execution. Either way, > you want - and this is the central point of the whole patch here - to > inject a scan path into a joinrel. It is not altogether obvious to me > what the best placement for this is. In the materialized view case, > you probably need a perfect match between the baserels in the view and > the baserels in the joinrel to do anything. There's no point in > re-checking that for every innerrels/outerrels combination. I don't > know enough about the GPU case to reason about it intelligently; maybe > KaiGai can comment. > In case of GPU, extension will add alternative paths based on hash-join and nested-loop algorithm with individual cost estimation as long as device can execute join condition. It expects planner (set_cheapest) will choose the best path in the built-in/additional ones. So, it is more reasonable for me, if extension can utilize a common infrastructure as built-in logic (hash-join/merge-join/nested-loop) is using to compute its cost estimation. > But there's another possible approach: suppose that > join_search_one_level, after considering left-sided and right-sided > joins and after considering bushy joins, checks whether every relation > it's got is from the same foreign server, and if so, asks that foreign > server whether it would like to contribute any paths. Would that be > better or worse? A disadvantage is that if you've got something like > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > down (say, each join clause calls a non-pushdown-safe function) you'll > end up examining a pile of joinrels - at every level of the join tree > - and individually rejecting each one. With the > build-it-up-incrementally approach, you'll figure that all out at > level 2, and then after that there's nothing to do but give up > quickly. On the other hand, I'm afraid the incremental approach might > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > foreign tables on the same server. If the output of the big/huge join > is big, none of those paths are going to survive at level 2, but the > overall join size might be very small, so we surely want a chance to > recover at level 3. (We discussed test cases of this form quite a bit > in the context of e2fa76d80ba571d4de8992de6386536867250474.) > > Thoughts? > Do we need to pay attention on relids of joinrel, instead of innerpath and outerpath? Yep, we might assume a path with join pushed-down has cheaper cost than combination of two foreign-scan and a local join, however, foreign-scan with join pushed-down may partially have expensive cost. In this case, either of hook location may be reasonable, because FDW driver can check whether all the relids are foreign-scan path managed by same foreign-server, or not, regardless of innerpath/outerpath. Of course, it is a significant factor for extensions (including FDW driver) whether hook allows to utilize a common infrastructure (like SpecialJoinInfo or join restrictlist, ...). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
The attached patch changed invocation order of GetForeignJoinPaths and set_join_pathlist_hook, and adjusted documentation part on custom-scan.sgml. Other portions are kept as previous version. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Sunday, March 15, 2015 11:38 AM > To: Robert Haas; Tom Lane > Cc: Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Robert Haas <robertmhaas@gmail.com> writes: > > >> Another bit of this that I think we could commit without fretting > > >> about it too much is the code adding set_join_pathlist_hook. This is > > >> - I think - analogous to set_rel_pathlist_hook, and like that hook, > > >> could be used for other purposes than custom plan generation - e.g. to > > >> delete paths we do not want to use. I've extracted this portion of > > >> the patch and adjusted the comments; if there are no objections, I > > >> will commit this bit also. > > > > > > I don't object to the concept, but I think that is a pretty bad place > > > to put the hook call: add_paths_to_joinrel is typically called multiple > > > (perhaps *many*) times per joinrel and thus this placement would force > > > any user of the hook to do a lot of repetitive work. > > > > Interesting point. I guess the question is whether a some or all > > callers are going to actually *want* a separate call for each > > invocation of add_paths_to_joinrel(), or whether they'll be happy to > > operate on the otherwise-complete path list. It's true that if your > > goal is to delete paths, it's probably best to be called just once > > after the path list is complete, and there might be a use case for > > that, but I guess it's less useful than for baserels. For a baserel, > > as long as you don't nuke the sequential-scan path, there is always > > going to be a way to complete the plan; so this would be a fine way to > > implement a disable-an-index extension. But for joinrels, it's not so > > easy to rule out, say, a hash-join here. Neither hook placement is > > much good for that; the path you want to get rid of may have already > > dominated paths you want to keep. > > > From the standpoint of extension development, I'm uncertain whether we > can easily reproduce information needed to compute alternative paths on > the hook at standard_join_search(), like a hook at add_paths_to_joinrel(). > > (Please correct me, if I misunderstood.) > For example, it is not obvious which path is inner/outer of the joinrel > on which custom-scan provider tries to add an alternative scan path. > Probably, extension needs to find out the path of source relations from > the join_rel_level[] array. > Also, how do we pull SpecialJoinInfo? It contains needed information to > identify required join-type (like JOIN_LEFT), however, extension needs > to search join_info_list by relids again, if hook is located at > standard_join_search(). > Even if number of hook invocation is larger if it is located on > add_paths_to_joinrel(), it allows to design extensions simpler, > I think. > > > Suppose you want to add paths - e.g. you have an extension that goes > > and looks for a materialized view that matches this subtree of the > > query, and if it finds one, it substitutes a scan of the materialized > > view for a scan of the baserel. Or, as in KaiGai's case, you have an > > extension that can perform the whole join in GPU-land and produce the > > same results we would have gotten via normal execution. Either way, > > you want - and this is the central point of the whole patch here - to > > inject a scan path into a joinrel. It is not altogether obvious to me > > what the best placement for this is. In the materialized view case, > > you probably need a perfect match between the baserels in the view and > > the baserels in the joinrel to do anything. There's no point in > > re-checking that for every innerrels/outerrels combination. I don't > > know enough about the GPU case to reason about it intelligently; maybe > > KaiGai can comment. > > > In case of GPU, extension will add alternative paths based on hash-join > and nested-loop algorithm with individual cost estimation as long as > device can execute join condition. It expects planner (set_cheapest) > will choose the best path in the built-in/additional ones. > So, it is more reasonable for me, if extension can utilize a common > infrastructure as built-in logic (hash-join/merge-join/nested-loop) > is using to compute its cost estimation. > > > But there's another possible approach: suppose that > > join_search_one_level, after considering left-sided and right-sided > > joins and after considering bushy joins, checks whether every relation > > it's got is from the same foreign server, and if so, asks that foreign > > server whether it would like to contribute any paths. Would that be > > better or worse? A disadvantage is that if you've got something like > > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > > down (say, each join clause calls a non-pushdown-safe function) you'll > > end up examining a pile of joinrels - at every level of the join tree > > - and individually rejecting each one. With the > > build-it-up-incrementally approach, you'll figure that all out at > > level 2, and then after that there's nothing to do but give up > > quickly. On the other hand, I'm afraid the incremental approach might > > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > > foreign tables on the same server. If the output of the big/huge join > > is big, none of those paths are going to survive at level 2, but the > > overall join size might be very small, so we surely want a chance to > > recover at level 3. (We discussed test cases of this form quite a bit > > in the context of e2fa76d80ba571d4de8992de6386536867250474.) > > > > Thoughts? > > > Do we need to pay attention on relids of joinrel, instead of innerpath > and outerpath? Yep, we might assume a path with join pushed-down has > cheaper cost than combination of two foreign-scan and a local join, > however, foreign-scan with join pushed-down may partially have > expensive cost. > In this case, either of hook location may be reasonable, because FDW > driver can check whether all the relids are foreign-scan path managed > by same foreign-server, or not, regardless of innerpath/outerpath. > Of course, it is a significant factor for extensions (including FDW > driver) whether hook allows to utilize a common infrastructure (like > SpecialJoinInfo or join restrictlist, ...). > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sat, Mar 14, 2015 at 10:37 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > From the standpoint of extension development, I'm uncertain whether we > can easily reproduce information needed to compute alternative paths on > the hook at standard_join_search(), like a hook at add_paths_to_joinrel(). > > (Please correct me, if I misunderstood.) > For example, it is not obvious which path is inner/outer of the joinrel > on which custom-scan provider tries to add an alternative scan path. That's a problem for the GPU-join use case, where you are essentially trying to add new join types to the system. But it's NOT a problem if what you're actually trying to do is substitute a *scan* from a *join*. If you're going to produce the join output by scanning a materialized view, or by scanning the results of a query pushed down to a foreign server, you don't need to divide the rels into inner rels and outer rels; indeed, any such division would be artificial. You just need to generate a query that produces the right answer *for the entire joinrel* and push it down. I'd really like to hear what the folks who care about FDW join pushdown think about this hook placement issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas@gmail.com>: > I think the foreign data wrapper join pushdown case, which also aims > to substitute a scan for a join, is interesting to think about, even > though it's likely to be handled by a new FDW method instead of via > the hook. Where should the FDW method get called from? Currently, > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets > called from add_paths_to_joinrel(). The patch at > http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com > uses that to implement join pushdown in postgres_fdw; if you have A > JOIN B JOIN C all on server X, we'll notice that the join with A and B > can be turned into a foreign scan on A JOIN B, and similarly for A-C > and B-C. Then, if it turns out that the cheapest path for A-B is the > foreign join, and the cheapest path for C is a foreign scan, we'll > arrive at the idea of a foreign scan on A-B-C, and we'll realize the > same thing in each of the other combinations as well. So, eventually > the foreign join gets pushed down. From the viewpoint of postgres_fdw, incremental approach seemed natural way, although postgres_fdw should consider paths in pathlist in additon to cheapest one as you mentioned in another thread. This approarch allows FDW to use SQL statement generated for underlying scans as parts of FROM clause, as postgres_fdw does in the join push-down patch. > But there's another possible approach: suppose that > join_search_one_level, after considering left-sided and right-sided > joins and after considering bushy joins, checks whether every relation > it's got is from the same foreign server, and if so, asks that foreign > server whether it would like to contribute any paths. Would that be > better or worse? A disadvantage is that if you've got something like > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > down (say, each join clause calls a non-pushdown-safe function) you'll > end up examining a pile of joinrels - at every level of the join tree > - and individually rejecting each one. With the > build-it-up-incrementally approach, you'll figure that all out at > level 2, and then after that there's nothing to do but give up > quickly. On the other hand, I'm afraid the incremental approach might > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > foreign tables on the same server. If the output of the big/huge join > is big, none of those paths are going to survive at level 2, but the > overall join size might be very small, so we surely want a chance to > recover at level 3. (We discussed test cases of this form quite a bit > in the context of e2fa76d80ba571d4de8992de6386536867250474.) Interesting, I overlooked that pattern. As you pointed out, join between big foregin tables might be dominated, perhaps by a MergeJoin path. Leaving dominated ForeignPath in pathlist for more optimization in the future (in higher join level) is an idea, but it would make planning time longer (and use more cycle and memory). Tom's idea sounds good for saving the path b), but I worry that whether FDW can get enough information at that timing, just before set_cheapest. It would not be good I/F if each FDW needs to copy many code form joinrel.c... -- Shigeru HANADA
> -----Original Message----- > From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com] > Sent: Monday, March 16, 2015 9:59 PM > To: Robert Haas > Cc: Tom Lane; Thom Brown; Kaigai Kouhei(海外 浩平); pgsql-hackers@postgreSQL.org > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom > Plan API) > > 2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas@gmail.com>: > > I think the foreign data wrapper join pushdown case, which also aims > > to substitute a scan for a join, is interesting to think about, even > > though it's likely to be handled by a new FDW method instead of via > > the hook. Where should the FDW method get called from? Currently, > > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets > > called from add_paths_to_joinrel(). The patch at > > > http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9 > ztFZG7zN64Xw@mail.gmail.com > > uses that to implement join pushdown in postgres_fdw; if you have A > > JOIN B JOIN C all on server X, we'll notice that the join with A and B > > can be turned into a foreign scan on A JOIN B, and similarly for A-C > > and B-C. Then, if it turns out that the cheapest path for A-B is the > > foreign join, and the cheapest path for C is a foreign scan, we'll > > arrive at the idea of a foreign scan on A-B-C, and we'll realize the > > same thing in each of the other combinations as well. So, eventually > > the foreign join gets pushed down. > > From the viewpoint of postgres_fdw, incremental approach seemed > natural way, although postgres_fdw should consider paths in pathlist > in additon to cheapest one as you mentioned in another thread. This > approarch allows FDW to use SQL statement generated for underlying > scans as parts of FROM clause, as postgres_fdw does in the join > push-down patch. > > > But there's another possible approach: suppose that > > join_search_one_level, after considering left-sided and right-sided > > joins and after considering bushy joins, checks whether every relation > > it's got is from the same foreign server, and if so, asks that foreign > > server whether it would like to contribute any paths. Would that be > > better or worse? A disadvantage is that if you've got something like > > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > > down (say, each join clause calls a non-pushdown-safe function) you'll > > end up examining a pile of joinrels - at every level of the join tree > > - and individually rejecting each one. With the > > build-it-up-incrementally approach, you'll figure that all out at > > level 2, and then after that there's nothing to do but give up > > quickly. On the other hand, I'm afraid the incremental approach might > > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > > foreign tables on the same server. If the output of the big/huge join > > is big, none of those paths are going to survive at level 2, but the > > overall join size might be very small, so we surely want a chance to > > recover at level 3. (We discussed test cases of this form quite a bit > > in the context of e2fa76d80ba571d4de8992de6386536867250474.) > > Interesting, I overlooked that pattern. As you pointed out, join > between big foregin tables might be dominated, perhaps by a MergeJoin > path. Leaving dominated ForeignPath in pathlist for more optimization > in the future (in higher join level) is an idea, but it would make > planning time longer (and use more cycle and memory). > > Tom's idea sounds good for saving the path b), but I worry that > whether FDW can get enough information at that timing, just before > set_cheapest. It would not be good I/F if each FDW needs to copy many > code form joinrel.c... > I had a call to discuss this topic with Hanada-san. Even though he expected FDW driver needs to check and extract relations involved in a particular join, it also means we have less problem as long as core backend can handle these common portion for all FDW/CSP drivers. Thus, we need care about two hook locations. The first one is add_paths_to_joinrel() as current patch doing, for custom-scan that adds an alternative join logic and takes underlying child nodes as input. The other one is standard_join_search() as Tom pointed out, for foreign-scan of remote join, or for custom-scan that replaces an entire join subtree. One positive aspect of this approach is, postgres_fdw can handle whole-row-reference much simpler than bottom-up approach, according to Hanada-san. Remaining issue is, how to implement the core portion that extracts relations in a particular join, and to identify join type to be applied on a particular relations. One rough idea is, we pull relids bitmap from the target joinrel, then references the SpecialJoinInfo with identical union bitmap of left/righthand. It allows to inform FDW driver which relations and which another relations shall be joined in this level. For example, if relids=0x007 and relids=0x0018 are left joined, PlannerInfo shall have a SpecialJoinInfo that fits the requirement. Also, both of left/right side is not singleton, FDW driver will takes recursive process to construct remote join query on relids=0x007 and relids=0x0018. If all of them are inner-join, we don't need to take care about this. All FDW driver needs to do is, just putting the involved relation names in FROM-clause. It is my rough idea, thus, here may be better idea to extract relations involved in a particular join on a certain level. Please tell me, if you have some other ideas. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On 2015/03/14 7:18, Robert Haas wrote: > I think the foreign data wrapper join pushdown case, which also aims > to substitute a scan for a join, is interesting to think about, even > though it's likely to be handled by a new FDW method instead of via > the hook. Where should the FDW method get called from? I haven't had enough time to review the patch in details yet, so I don't know where we should call the method, but I'd vote for the idea of substituting a scan for a join, because I think that idea would probably allow update pushdown, which I'm proposing in the current CF, to scale up to handling a pushed-down update on a join. Best regards, Etsuro Fujita
On Sat, Mar 14, 2015 at 3:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Another bit of this that I think we could commit without fretting
>> about it too much is the code adding set_join_pathlist_hook. This is
>> - I think - analogous to set_rel_pathlist_hook, and like that hook,
>> could be used for other purposes than custom plan generation - e.g. to
>> delete paths we do not want to use. I've extracted this portion of
>> the patch and adjusted the comments; if there are no objections, I
>> will commit this bit also.
>
> I don't object to the concept, but I think that is a pretty bad place
> to put the hook call: add_paths_to_joinrel is typically called multiple
> (perhaps *many*) times per joinrel and thus this placement would force
> any user of the hook to do a lot of repetitive work.
Interesting point. I guess the question is whether a some or all
callers are going to actually *want* a separate call for each
invocation of add_paths_to_joinrel(), or whether they'll be happy to
operate on the otherwise-complete path list. It's true that if your
goal is to delete paths, it's probably best to be called just once
after the path list is complete, and there might be a use case for
that, but I guess it's less useful than for baserels. For a baserel,
as long as you don't nuke the sequential-scan path, there is always
going to be a way to complete the plan; so this would be a fine way to
implement a disable-an-index extension. But for joinrels, it's not so
easy to rule out, say, a hash-join here. Neither hook placement is
much good for that; the path you want to get rid of may have already
dominated paths you want to keep.
Suppose you want to add paths - e.g. you have an extension that goes
and looks for a materialized view that matches this subtree of the
query, and if it finds one, it substitutes a scan of the materialized
view for a scan of the baserel. Or, as in KaiGai's case, you have an
extension that can perform the whole join in GPU-land and produce the
same results we would have gotten via normal execution. Either way,
you want - and this is the central point of the whole patch here - to
inject a scan path into a joinrel. It is not altogether obvious to me
what the best placement for this is. In the materialized view case,
you probably need a perfect match between the baserels in the view and
the baserels in the joinrel to do anything. There's no point in
re-checking that for every innerrels/outerrels combination. I don't
know enough about the GPU case to reason about it intelligently; maybe
KaiGai can comment.
I think the foreign data wrapper join pushdown case, which also aims
to substitute a scan for a join, is interesting to think about, even
though it's likely to be handled by a new FDW method instead of via
the hook. Where should the FDW method get called from? Currently,
the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
called from add_paths_to_joinrel(). The patch at
http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
uses that to implement join pushdown in postgres_fdw; if you have A
JOIN B JOIN C all on server X, we'll notice that the join with A and B
can be turned into a foreign scan on A JOIN B, and similarly for A-C
and B-C. Then, if it turns out that the cheapest path for A-B is the
foreign join, and the cheapest path for C is a foreign scan, we'll
arrive at the idea of a foreign scan on A-B-C, and we'll realize the
same thing in each of the other combinations as well. So, eventually
the foreign join gets pushed down.
But there's another possible approach: suppose that
join_search_one_level, after considering left-sided and right-sided
joins and after considering bushy joins, checks whether every relation
it's got is from the same foreign server, and if so, asks that foreign
server whether it would like to contribute any paths. Would that be
better or worse? A disadvantage is that if you've got something like
A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
down (say, each join clause calls a non-pushdown-safe function) you'll
end up examining a pile of joinrels - at every level of the join tree
- and individually rejecting each one. With the
build-it-up-incrementally approach, you'll figure that all out at
level 2, and then after that there's nothing to do but give up
quickly. On the other hand, I'm afraid the incremental approach might
miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
foreign tables on the same server. If the output of the big/huge join
is big, none of those paths are going to survive at level 2, but the
overall join size might be very small, so we surely want a chance to
recover at level 3. (We discussed test cases of this form quite a bit
in the context of e2fa76d80ba571d4de8992de6386536867250474.)
The real problem here, is that with FDW in picture, the "optimal substructure" property required by dynamic programming is broken. If A foreign join B foreign join C is optimal solution for problem A join B join C, A foreign join B is not necessarily optimal solution for subproblem A join B. While for local relations, PostgreSQL has to compute each two way join itself, and thus chooses the cheapest path for each two way join, FDW (esp. those working with real foreign servers) do not compute the joins in two-way fashion and don't need to choose the cheapest path for each two way join.
A way to work around this is to leave the ForeignPaths (there can possibly be only one foreign path per join relation) in the joinrel without removing them. FDW should work on joining two relations if they have foreign paths in the list of paths, irrespective of whether the cheapest path is foreign join path or not. For the topmost joinrel, if the foreign path happens to be the cheapest one, the whole join tree will be pushed down.
A way to work around this is to leave the ForeignPaths (there can possibly be only one foreign path per join relation) in the joinrel without removing them. FDW should work on joining two relations if they have foreign paths in the list of paths, irrespective of whether the cheapest path is foreign join path or not. For the topmost joinrel, if the foreign path happens to be the cheapest one, the whole join tree will be pushed down.
On the other thread implementing foreign join for postgres_fdw, postgresGetForeignJoinPaths(), is just looking at the cheapest path, which would cause the problem you have described above.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Tue, Mar 17, 2015 at 10:28 AM, Shigeru Hanada <shigeru.hanada@gmail.com> wrote:
Even I have the same concern. A simple joinrel doesn't contain much information about the individual two way joins involved in it, so FDW may not be able to construct a query (or execution plan) and hence judge whether a join is pushable or not, just by looking at the joinrel. There will be a lot of code duplication to reconstruct that information, within the FDW code.
2015-03-14 7:18 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> I think the foreign data wrapper join pushdown case, which also aims
> to substitute a scan for a join, is interesting to think about, even
> though it's likely to be handled by a new FDW method instead of via
> the hook. Where should the FDW method get called from? Currently,
> the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets
> called from add_paths_to_joinrel(). The patch at
> http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbfqF82YSb9ztFZG7zN64Xw@mail.gmail.com
> uses that to implement join pushdown in postgres_fdw; if you have A
> JOIN B JOIN C all on server X, we'll notice that the join with A and B
> can be turned into a foreign scan on A JOIN B, and similarly for A-C
> and B-C. Then, if it turns out that the cheapest path for A-B is the
> foreign join, and the cheapest path for C is a foreign scan, we'll
> arrive at the idea of a foreign scan on A-B-C, and we'll realize the
> same thing in each of the other combinations as well. So, eventually
> the foreign join gets pushed down.
From the viewpoint of postgres_fdw, incremental approach seemed
natural way, although postgres_fdw should consider paths in pathlist
in additon to cheapest one as you mentioned in another thread. This
approarch allows FDW to use SQL statement generated for underlying
scans as parts of FROM clause, as postgres_fdw does in the join
push-down patch.
> But there's another possible approach: suppose that
> join_search_one_level, after considering left-sided and right-sided
> joins and after considering bushy joins, checks whether every relation
> it's got is from the same foreign server, and if so, asks that foreign
> server whether it would like to contribute any paths. Would that be
> better or worse? A disadvantage is that if you've got something like
> A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT
> JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed
> down (say, each join clause calls a non-pushdown-safe function) you'll
> end up examining a pile of joinrels - at every level of the join tree
> - and individually rejecting each one. With the
> build-it-up-incrementally approach, you'll figure that all out at
> level 2, and then after that there's nothing to do but give up
> quickly. On the other hand, I'm afraid the incremental approach might
> miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x =
> huge.x) ON small.y = big.y AND small.z = huge.z, where all three are
> foreign tables on the same server. If the output of the big/huge join
> is big, none of those paths are going to survive at level 2, but the
> overall join size might be very small, so we surely want a chance to
> recover at level 3. (We discussed test cases of this form quite a bit
> in the context of e2fa76d80ba571d4de8992de6386536867250474.)
Interesting, I overlooked that pattern. As you pointed out, join
between big foregin tables might be dominated, perhaps by a MergeJoin
path. Leaving dominated ForeignPath in pathlist for more optimization
in the future (in higher join level) is an idea, but it would make
planning time longer (and use more cycle and memory).
Tom's idea sounds good for saving the path b), but I worry that
whether FDW can get enough information at that timing, just before
set_cheapest. It would not be good I/F if each FDW needs to copy many
code form joinrel.c...
--
Shigeru HANADA
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
> On Sat, Mar 14, 2015 at 3:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > > On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> Another bit of this that I think we could commit without fretting > >> about it too much is the code adding set_join_pathlist_hook. This > is > >> - I think - analogous to set_rel_pathlist_hook, and like that hook, > >> could be used for other purposes than custom plan generation - e.g. > to > >> delete paths we do not want to use. I've extracted this portion of > >> the patch and adjusted the comments; if there are no objections, I > >> will commit this bit also. > > > > I don't object to the concept, but I think that is a pretty bad place > > to put the hook call: add_paths_to_joinrel is typically called multiple > > (perhaps *many*) times per joinrel and thus this placement would force > > any user of the hook to do a lot of repetitive work. > > Interesting point. I guess the question is whether a some or all > callers are going to actually *want* a separate call for each > invocation of add_paths_to_joinrel(), or whether they'll be happy to > operate on the otherwise-complete path list. It's true that if your > goal is to delete paths, it's probably best to be called just once > after the path list is complete, and there might be a use case for > that, but I guess it's less useful than for baserels. For a baserel, > as long as you don't nuke the sequential-scan path, there is always > going to be a way to complete the plan; so this would be a fine way to > implement a disable-an-index extension. But for joinrels, it's not so > easy to rule out, say, a hash-join here. Neither hook placement is > much good for that; the path you want to get rid of may have already > dominated paths you want to keep. > > Suppose you want to add paths - e.g. you have an extension that goes > and looks for a materialized view that matches this subtree of the > query, and if it finds one, it substitutes a scan of the materialized > view for a scan of the baserel. Or, as in KaiGai's case, you have an > extension that can perform the whole join in GPU-land and produce the > same results we would have gotten via normal execution. Either way, > you want - and this is the central point of the whole patch here - to > inject a scan path into a joinrel. It is not altogether obvious to me > what the best placement for this is. In the materialized view case, > you probably need a perfect match between the baserels in the view and > the baserels in the joinrel to do anything. There's no point in > re-checking that for every innerrels/outerrels combination. I don't > know enough about the GPU case to reason about it intelligently; maybe > KaiGai can comment. > > I think the foreign data wrapper join pushdown case, which also aims > to substitute a scan for a join, is interesting to think about, even > though it's likely to be handled by a new FDW method instead of via > the hook. Where should the FDW method get called from? Currently, > the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets > called from add_paths_to_joinrel(). The patch at > http://www.postgresql.org/message-id/CAEZqfEfy7p=uRpwN-Q-NNgzb8kwHbf > qF82YSb9ztFZG7zN64Xw@mail.gmail.com > uses that to implement join pushdown in postgres_fdw; if you have A > JOIN B JOIN C all on server X, we'll notice that the join with A and B > can be turned into a foreign scan on A JOIN B, and similarly for A-C > and B-C. Then, if it turns out that the cheapest path for A-B is the > foreign join, and the cheapest path for C is a foreign scan, we'll > arrive at the idea of a foreign scan on A-B-C, and we'll realize the > same thing in each of the other combinations as well. So, eventually > the foreign join gets pushed down. > > But there's another possible approach: suppose that > join_search_one_level, after considering left-sided and right-sided > joins and after considering bushy joins, checks whether every relation > it's got is from the same foreign server, and if so, asks that foreign > server whether it would like to contribute any paths. Would that be > better or worse? A disadvantage is that if you've got something like > A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT > JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed > down (say, each join clause calls a non-pushdown-safe function) you'll > end up examining a pile of joinrels - at every level of the join tree > - and individually rejecting each one. With the > build-it-up-incrementally approach, you'll figure that all out at > level 2, and then after that there's nothing to do but give up > quickly. On the other hand, I'm afraid the incremental approach might > miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = > huge.x) ON small.y = big.y AND small.z = huge.z, where all three are > foreign tables on the same server. If the output of the big/huge join > is big, none of those paths are going to survive at level 2, but the > overall join size might be very small, so we surely want a chance to > recover at level 3. (We discussed test cases of this form quite a bit > in the context of e2fa76d80ba571d4de8992de6386536867250474.) > > > > > The real problem here, is that with FDW in picture, the "optimal substructure" > property required by dynamic programming is broken. If A foreign join B foreign > join C is optimal solution for problem A join B join C, A foreign join B is not > necessarily optimal solution for subproblem A join B. While for local relations, > PostgreSQL has to compute each two way join itself, and thus chooses the cheapest > path for each two way join, FDW (esp. those working with real foreign servers) > do not compute the joins in two-way fashion and don't need to choose the cheapest > path for each two way join. > I cannot agree 100% because we cannot know whether A foreign join B foreign join C is optimal than A join B join C. For example, if (A x B) is estimated to generate O(N) rows but (A x B) x C is estimated to generate O(N x M) rows, local join may be optimal to process the final stage. Even if N-way remote join might be possible, we need to estimate the cost of remote join for each level, and make a decision whether it shall be pushed- down to the remote server based on the estimated cost. The hooks location Tom suggested requires FDW to compute a foreign-scan path for each joinrel during concentration of join combinations, but not multiple times for each joinrel. > A way to work around this is to leave the ForeignPaths (there can possibly be > only one foreign path per join relation) in the joinrel without removing them. > FDW should work on joining two relations if they have foreign paths in the list > of paths, irrespective of whether the cheapest path is foreign join path or not. > For the topmost joinrel, if the foreign path happens to be the cheapest one, the > whole join tree will be pushed down. > > On the other thread implementing foreign join for postgres_fdw, > postgresGetForeignJoinPaths(), is just looking at the cheapest path, which would > cause the problem you have described above. > It might be an idea if foreign-scan path is not wiped out regardless of the estimated cost, we will be able to construct an entirely remote-join path even if intermediation path is expensive than local join. A problem is, how to distinct these special paths from usual paths that are eliminated on the previous stage once its path is more expensive. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> A way to work around this is to leave the ForeignPaths (there can possibly be >> only one foreign path per join relation) in the joinrel without removing them. >> FDW should work on joining two relations if they have foreign paths in the list >> of paths, irrespective of whether the cheapest path is foreign join path or not. >> For the topmost joinrel, if the foreign path happens to be the cheapest one, the >> whole join tree will be pushed down. >> >> On the other thread implementing foreign join for postgres_fdw, >> postgresGetForeignJoinPaths(), is just looking at the cheapest path, which would >> cause the problem you have described above. >> > It might be an idea if foreign-scan path is not wiped out regardless of the > estimated cost, we will be able to construct an entirely remote-join path > even if intermediation path is expensive than local join. > A problem is, how to distinct these special paths from usual paths that are > eliminated on the previous stage once its path is more expensive. Any solution that is based on not eliminating paths that would otherwise be discarded based on cost seems to me to be unlikely to be feasible. We can't complicate the core path-cost-comparison stuff for the convenience of FDW or custom-scan pushdown. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 17, 2015 at 8:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> A way to work around this is to leave the ForeignPaths (there can possibly be
>> only one foreign path per join relation) in the joinrel without removing them.
>> FDW should work on joining two relations if they have foreign paths in the list
>> of paths, irrespective of whether the cheapest path is foreign join path or not.
>> For the topmost joinrel, if the foreign path happens to be the cheapest one, the
>> whole join tree will be pushed down.
>>
>> On the other thread implementing foreign join for postgres_fdw,
>> postgresGetForeignJoinPaths(), is just looking at the cheapest path, which would
>> cause the problem you have described above.
>>
> It might be an idea if foreign-scan path is not wiped out regardless of the
> estimated cost, we will be able to construct an entirely remote-join path
> even if intermediation path is expensive than local join.
> A problem is, how to distinct these special paths from usual paths that are
> eliminated on the previous stage once its path is more expensive.
Any solution that is based on not eliminating paths that would
otherwise be discarded based on cost seems to me to be unlikely to be
feasible. We can't complicate the core path-cost-comparison stuff for
the convenience of FDW or custom-scan pushdown.
We already have a precedence here. We cache different cheapest paths e.g
439 struct Path *cheapest_startup_path;
440 struct Path *cheapest_total_path;
441 struct Path *cheapest_unique_path;
442 List *cheapest_parameterized_paths;
439 struct Path *cheapest_startup_path;
440 struct Path *cheapest_total_path;
441 struct Path *cheapest_unique_path;
442 List *cheapest_parameterized_paths;
All we have to do is add yet another there "cheapest_foreign_path" which can be NULL like cheapest_unique_path.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> It might be an idea if foreign-scan path is not wiped out regardless of the >> estimated cost, we will be able to construct an entirely remote-join path >> even if intermediation path is expensive than local join. >> A problem is, how to distinct these special paths from usual paths that are >> eliminated on the previous stage once its path is more expensive. > Any solution that is based on not eliminating paths that would > otherwise be discarded based on cost seems to me to be unlikely to be > feasible. We can't complicate the core path-cost-comparison stuff for > the convenience of FDW or custom-scan pushdown. I concur. I'm not even so worried about the cost of add_path as such; the real problem with not discarding paths as aggressively as possible is that it will result in a combinatorial explosion in the number of path combinations that have to be examined at higher join levels. regards, tom lane
> Robert Haas <robertmhaas@gmail.com> writes: > > On Tue, Mar 17, 2015 at 10:11 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> It might be an idea if foreign-scan path is not wiped out regardless of the > >> estimated cost, we will be able to construct an entirely remote-join path > >> even if intermediation path is expensive than local join. > >> A problem is, how to distinct these special paths from usual paths that are > >> eliminated on the previous stage once its path is more expensive. > > > Any solution that is based on not eliminating paths that would > > otherwise be discarded based on cost seems to me to be unlikely to be > > feasible. We can't complicate the core path-cost-comparison stuff for > > the convenience of FDW or custom-scan pushdown. > > I concur. I'm not even so worried about the cost of add_path as such; > the real problem with not discarding paths as aggressively as possible > is that it will result in a combinatorial explosion in the number of > path combinations that have to be examined at higher join levels. > I'm inclined to agree. It is also conclusion of the discussion with Hanada-san yesterday, due to the number of paths to be considered and combination problems, as you mentioned above. So, overall consensus for the FDW hook location is just before the set_chepest() at standard_join_search() and merge_clump(), isn't it? Let me make a design of FDW hook to reduce code duplications for each FDW driver, especially, to identify baserel/joinrel to be involved in this join. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > So, overall consensus for the FDW hook location is just before the set_chepest() > at standard_join_search() and merge_clump(), isn't it? Yes, I think so. > Let me make a design of FDW hook to reduce code duplications for each FDW driver, > especially, to identify baserel/joinrel to be involved in this join. Great, thanks! One issue, which I think Ashutosh alluded to upthread, is that we need to make sure it's not unreasonably difficult for foreign data wrappers to construct the FROM clause of an SQL query to be pushed down to the remote side. It should be simple when there are only inner joins involved, but when there are all outer joins it might be a bit complex. It would be very good if someone could try to write that code, based on the new hook locations, and see how it turns out, so that we can figure out how to address any issues that may crop up there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > So, overall consensus for the FDW hook location is just before the set_chepest() > > at standard_join_search() and merge_clump(), isn't it? > > Yes, I think so. > > > Let me make a design of FDW hook to reduce code duplications for each FDW driver, > > especially, to identify baserel/joinrel to be involved in this join. > > Great, thanks! > > One issue, which I think Ashutosh alluded to upthread, is that we need > to make sure it's not unreasonably difficult for foreign data wrappers > to construct the FROM clause of an SQL query to be pushed down to the > remote side. It should be simple when there are only inner joins > involved, but when there are all outer joins it might be a bit > complex. It would be very good if someone could try to write that > code, based on the new hook locations, and see how it turns out, so > that we can figure out how to address any issues that may crop up > there. > Here is an idea that provides a common utility function that break down the supplied RelOptInfo of joinrel into a pair of join-type and a list of baserel/joinrel being involved in the relations join. It intends to be called by FDW driver to list up underlying relations. IIUC, root->join_info_list will provide information of how relations are combined to the upper joined relations, thus, I expect it is not unreasonably complicated way to solve. Once a RelOptInfo of the target joinrel is broken down into multiple sub- relations (N>=2 if all inner join, elsewhere N=2), FDW driver can reference the RestrictInfo to be used in relations join. Anyway, I'll try to investigate the existing code for more detail today, to clarify whether the above approach is feasible. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> > So, overall consensus for the FDW hook location is just before the set_chepest() >> > at standard_join_search() and merge_clump(), isn't it? >> >> Yes, I think so. >> >> > Let me make a design of FDW hook to reduce code duplications for each FDW driver, >> > especially, to identify baserel/joinrel to be involved in this join. >> >> Great, thanks! >> >> One issue, which I think Ashutosh alluded to upthread, is that we need >> to make sure it's not unreasonably difficult for foreign data wrappers >> to construct the FROM clause of an SQL query to be pushed down to the >> remote side. It should be simple when there are only inner joins >> involved, but when there are all outer joins it might be a bit >> complex. It would be very good if someone could try to write that >> code, based on the new hook locations, and see how it turns out, so >> that we can figure out how to address any issues that may crop up >> there. >> > Here is an idea that provides a common utility function that break down > the supplied RelOptInfo of joinrel into a pair of join-type and a list of > baserel/joinrel being involved in the relations join. It intends to be > called by FDW driver to list up underlying relations. > IIUC, root->join_info_list will provide information of how relations are > combined to the upper joined relations, thus, I expect it is not > unreasonably complicated way to solve. > Once a RelOptInfo of the target joinrel is broken down into multiple sub- > relations (N>=2 if all inner join, elsewhere N=2), FDW driver can > reference the RestrictInfo to be used in relations join. > > Anyway, I'll try to investigate the existing code for more detail today, > to clarify whether the above approach is feasible. Sounds good. Keep in mind that, while the parse tree will obviously reflect the way that the user actually specified the join syntactically, it's not the job of the join_info_list to make it simple to reconstruct that information. To the contrary, join_info_list is supposed to be structured in a way that makes it easy to determine whether *a particular join order is one of the legal join orders* not *whether it is the specific join order selected by the user*. See join_is_legal(). For FDW pushdown, I think it's sufficient to be able to identify *any one* legal join order, not necessarily the same order the user originally entered. For exampe, if the user entered A LEFT JOIN B ON A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I suspect that's just fine. Particular FDWs might wish to try to be smart about what they emit based on knowledge of what the remote side's optimizer is likely to do, and that's fine. If the remote side is PostgreSQL, it shouldn't matter much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> > So, overall consensus for the FDW hook location is just before the set_chepest() > >> > at standard_join_search() and merge_clump(), isn't it? > >> > >> Yes, I think so. > >> > >> > Let me make a design of FDW hook to reduce code duplications for each FDW driver, > >> > especially, to identify baserel/joinrel to be involved in this join. > >> > >> Great, thanks! > >> > >> One issue, which I think Ashutosh alluded to upthread, is that we need > >> to make sure it's not unreasonably difficult for foreign data wrappers > >> to construct the FROM clause of an SQL query to be pushed down to the > >> remote side. It should be simple when there are only inner joins > >> involved, but when there are all outer joins it might be a bit > >> complex. It would be very good if someone could try to write that > >> code, based on the new hook locations, and see how it turns out, so > >> that we can figure out how to address any issues that may crop up > >> there. > >> > > Here is an idea that provides a common utility function that break down > > the supplied RelOptInfo of joinrel into a pair of join-type and a list of > > baserel/joinrel being involved in the relations join. It intends to be > > called by FDW driver to list up underlying relations. > > IIUC, root->join_info_list will provide information of how relations are > > combined to the upper joined relations, thus, I expect it is not > > unreasonably complicated way to solve. > > Once a RelOptInfo of the target joinrel is broken down into multiple sub- > > relations (N>=2 if all inner join, elsewhere N=2), FDW driver can > > reference the RestrictInfo to be used in relations join. > > > > Anyway, I'll try to investigate the existing code for more detail today, > > to clarify whether the above approach is feasible. > > Sounds good. Keep in mind that, while the parse tree will obviously > reflect the way that the user actually specified the join > syntactically, it's not the job of the join_info_list to make it > simple to reconstruct that information. To the contrary, > join_info_list is supposed to be structured in a way that makes it > easy to determine whether *a particular join order is one of the legal > join orders* not *whether it is the specific join order selected by > the user*. See join_is_legal(). > > For FDW pushdown, I think it's sufficient to be able to identify *any > one* legal join order, not necessarily the same order the user > originally entered. For exampe, if the user entered A LEFT JOIN B ON > A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that > instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I > suspect that's just fine. Particular FDWs might wish to try to be > smart about what they emit based on knowledge of what the remote > side's optimizer is likely to do, and that's fine. If the remote side > is PostgreSQL, it shouldn't matter much. > Sorry for my response late. It was not easy to code during business trip. The attached patch adds a hook for FDW/CSP to replace entire join-subtree by a foreign/custom-scan, according to the discussion upthread. GetForeignJoinPaths handler of FDW is simplified as follows: typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root, RelOptInfo *joinrel); It takes PlannerInfo and RelOptInfo of the join-relation to be replaced if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be able to know the relations to be involved and construct a remote join query. However, it is not obvious with RelOptInfo to know how relations are joined. The function below will help implement FDW driver that support remote join. List * get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel, SpecialJoinInfo **p_sjinfo) It returns a list of RelOptInfo to be involved in the relations join that is represented with 'joinrel', and also set a SpecialJoinInfo on the third argument if not inner join. In case of inner join, it returns multiple (more than or equal to 2) relations to be inner-joined. Elsewhere, it returns two relations and a valid SpecialJoinInfo. The #if 0 ... #endif block is just for confirmation purpose to show how hook is invoked and the joinrel is broken down with above service routine. postgres=# select * from t0 left join t1 on t1.aid=bid left join t2 on t1.aid=cid left join t3 on t1.aid=did left join t4 on t1.aid=eid; INFO: LEFT JOIN: t0, t1 INFO: LEFT JOIN: (t0, t1), t2 INFO: LEFT JOIN: (t0, t1), t3 INFO: LEFT JOIN: (t0, t1), t4 INFO: LEFT JOIN: (t0, t1, t3), t2 INFO: LEFT JOIN: (t0, t1, t4), t2 INFO: LEFT JOIN: (t0, t1, t4), t3 INFO: LEFT JOIN: (t0, t1, t3, t4), t2 In this case, joinrel is broken down into (t0, t1, t3, t4) and t2. The earlier one is also joinrel, so it expects FDW driver will make the get_joinrel_broken_down() call recurdively. postgres=# explain select * from t0 natural join t1 natural join t2 natural join t3 natural join t4; INFO: INNER JOIN: t0, t1 INFO: INNER JOIN: t0, t2 INFO: INNER JOIN: t0, t3 INFO: INNER JOIN: t0, t4 INFO: INNER JOIN: t0, t1, t2 INFO: INNER JOIN: t0, t1, t3 INFO: INNER JOIN: t0, t1, t4 INFO: INNER JOIN: t0, t2, t3 INFO: INNER JOIN: t0, t2, t4 INFO: INNER JOIN: t0, t3, t4 INFO: INNER JOIN: t0, t1, t2, t3 INFO: INNER JOIN: t0, t1, t2, t4 INFO: INNER JOIN: t0, t1, t3, t4 INFO: INNER JOIN: t0, t2, t3, t4 INFO: INNER JOIN: t0, t1, t2, t3, t4 In this case, joinrel is consist of inner join, so get_joinrel_broken_down() returns a list that contains RelOptInfo of 6 base relations. postgres=# explain select * from t0 natural join t1 left join t2 on t1.aid=t2.bid natural join t3 natural join t4; INFO: INNER JOIN: t0, t1 INFO: INNER JOIN: t0, t3 INFO: INNER JOIN: t0, t4 INFO: LEFT JOIN: t1, t2 INFO: INNER JOIN: (t1, t2), t0 INFO: INNER JOIN: t0, t1, t3 INFO: INNER JOIN: t0, t1, t4 INFO: INNER JOIN: t0, t3, t4 INFO: INNER JOIN: (t1, t2), t0, t3 INFO: INNER JOIN: (t1, t2), t0, t4 INFO: INNER JOIN: t0, t1, t3, t4 INFO: INNER JOIN: (t1, t2), t0, t3, t4 In mixture case, it keeps restriction of join legality (t1 and t2 must be left joined) during its broken down. At this moment, I'm not 100% certain about its logic. Especially, I didn't test SEMI- and ANTI- join cases yet. However, time is money - I want people to check overall design first, rather than detailed debugging. Please tell me if I misunderstood the logic to break down join relations. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2015/03/23 9:12、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > Sorry for my response late. It was not easy to code during business trip. > > The attached patch adds a hook for FDW/CSP to replace entire join-subtree > by a foreign/custom-scan, according to the discussion upthread. > > GetForeignJoinPaths handler of FDW is simplified as follows: > typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root, > RelOptInfo *joinrel); It’s not a critical issue but I’d like to propose to rename add_joinrel_extra_paths() to add_extra_paths_to_joinrel(), becausethe latter would make it more clear that it does extra work in addition to add_paths_to_joinrel(). > It takes PlannerInfo and RelOptInfo of the join-relation to be replaced > if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be > able to know the relations to be involved and construct a remote join query. > However, it is not obvious with RelOptInfo to know how relations are joined. > > The function below will help implement FDW driver that support remote join. > > List * > get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel, > SpecialJoinInfo **p_sjinfo) > > It returns a list of RelOptInfo to be involved in the relations join that > is represented with 'joinrel', and also set a SpecialJoinInfo on the third > argument if not inner join. > In case of inner join, it returns multiple (more than or equal to 2) > relations to be inner-joined. Elsewhere, it returns two relations and > a valid SpecialJoinInfo. As far as I tested, it works fine for SEMI and ANTI. # I want dump function of BitmapSet for debugging, as Node has nodeToString()... > At this moment, I'm not 100% certain about its logic. Especially, I didn't > test SEMI- and ANTI- join cases yet. > However, time is money - I want people to check overall design first, rather > than detailed debugging. Please tell me if I misunderstood the logic to break > down join relations. With applying your patch, regression tests of “updatable view” failed. regression.diff contains some errors like this: ! ERROR: could not find RelOptInfo for given relids Could you check that? — Shigeru HANADA
> > At this moment, I'm not 100% certain about its logic. Especially, I didn't > > test SEMI- and ANTI- join cases yet. > > However, time is money - I want people to check overall design first, rather > > than detailed debugging. Please tell me if I misunderstood the logic to break > > down join relations. > > With applying your patch, regression tests of “updatable view” failed. > regression.diff contains some errors like this: > ! ERROR: could not find RelOptInfo for given relids > > Could you check that? > It is a bug around the logic to find out two RelOptInfo that can construct another RelOptInfo of joinrel. Even though I'm now working to correct the logic, it is not obvious to identify two relids that satisfy joinrel->relids. (Yep, law of entropy enhancement...) On the other hands, we may have a solution that does not need a complicated reconstruction process. The original concern was, FDW driver may add paths that will replace entire join subtree by foreign-scan on remote join multiple times, repeatedly, but these paths shall be identical. If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able to solve the problem more straight-forward and simply way. Because build_join_rel() finds a cache on root->join_rel_hash then returns immediately if required joinrelids already has its RelOptInfo, bottom of this function never called twice on a particular set of joinrelids. Once FDW/CSP constructs a path that replaces entire join subtree towards the joinrel just after construction, it shall be kept until cheaper built-in paths are added (if exists). This idea has one other positive side-effect. We expect remote-join is cheaper than local join with two remote scan in most cases. Once a much cheaper path is added prior to local join consideration, add_path_precheck() breaks path consideration earlier. Please comment on. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Shigeru HANADA [mailto:shigeru.hanada@gmail.com] > Sent: Tuesday, March 24, 2015 7:36 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Tom Lane; Ashutosh Bapat; Thom Brown; > pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > 2015/03/23 9:12、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > > > Sorry for my response late. It was not easy to code during business trip. > > > > The attached patch adds a hook for FDW/CSP to replace entire join-subtree > > by a foreign/custom-scan, according to the discussion upthread. > > > > GetForeignJoinPaths handler of FDW is simplified as follows: > > typedef void (*GetForeignJoinPaths_function) (PlannerInfo *root, > > RelOptInfo *joinrel); > > It’s not a critical issue but I’d like to propose to rename > add_joinrel_extra_paths() to add_extra_paths_to_joinrel(), because the latter > would make it more clear that it does extra work in addition to > add_paths_to_joinrel(). > > > It takes PlannerInfo and RelOptInfo of the join-relation to be replaced > > if available. RelOptInfo contains 'relids' bitmap, so FDW driver will be > > able to know the relations to be involved and construct a remote join query. > > However, it is not obvious with RelOptInfo to know how relations are joined. > > > > The function below will help implement FDW driver that support remote join. > > > > List * > > get_joinrel_broken_down(PlannerInfo *root, RelOptInfo *joinrel, > > SpecialJoinInfo **p_sjinfo) > > > > It returns a list of RelOptInfo to be involved in the relations join that > > is represented with 'joinrel', and also set a SpecialJoinInfo on the third > > argument if not inner join. > > In case of inner join, it returns multiple (more than or equal to 2) > > relations to be inner-joined. Elsewhere, it returns two relations and > > a valid SpecialJoinInfo. > > As far as I tested, it works fine for SEMI and ANTI. > # I want dump function of BitmapSet for debugging, as Node has nodeToString()... > > > At this moment, I'm not 100% certain about its logic. Especially, I didn't > > test SEMI- and ANTI- join cases yet. > > However, time is money - I want people to check overall design first, rather > > than detailed debugging. Please tell me if I misunderstood the logic to break > > down join relations. > > With applying your patch, regression tests of “updatable view” failed. > regression.diff contains some errors like this: > ! ERROR: could not find RelOptInfo for given relids > > Could you check that? > > — > Shigeru HANADA
Hello, I had a look on this. At Wed, 25 Mar 2015 03:59:28 +0000, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote in <9A28C8860F777E439AA12E8AEA7694F8010C6819@BPXM15GP.gisp.nec.co.jp> > > > At this moment, I'm not 100% certain about its logic. Especially, I didn't > > > test SEMI- and ANTI- join cases yet. > > > However, time is money - I want people to check overall design first, rather > > > than detailed debugging. Please tell me if I misunderstood the logic to break > > > down join relations. > > > > With applying your patch, regression tests of “updatable view” failed. > > regression.diff contains some errors like this: > > ! ERROR: could not find RelOptInfo for given relids > > > > Could you check that? > > > It is a bug around the logic to find out two RelOptInfo that can construct > another RelOptInfo of joinrel. It is caused by split (or multilevel) joinlist. Setting join_collapse_limit to 10 makes the query to go well. I suppose that get_joinrel_broken_down should give up returning result when given joinrel spans over multiple join subproblems, becuase they cannot be merged by FDW anyway even if they comformed the basic requirements for merging. > Even though I'm now working to correct the logic, it is not obvious to > identify two relids that satisfy joinrel->relids. > (Yep, law of entropy enhancement...) > > On the other hands, we may have a solution that does not need a complicated > reconstruction process. The original concern was, FDW driver may add paths > that will replace entire join subtree by foreign-scan on remote join multiple > times, repeatedly, but these paths shall be identical. > > If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able > to solve the problem more straight-forward and simply way. > Because build_join_rel() finds a cache on root->join_rel_hash then returns > immediately if required joinrelids already has its RelOptInfo, bottom of > this function never called twice on a particular set of joinrelids. > Once FDW/CSP constructs a path that replaces entire join subtree towards > the joinrel just after construction, it shall be kept until cheaper built-in > paths are added (if exists). > > This idea has one other positive side-effect. We expect remote-join is cheaper > than local join with two remote scan in most cases. Once a much cheaper path > is added prior to local join consideration, add_path_precheck() breaks path > consideration earlier. +1 as a whole. regards, -- 堀口恭太郎 日本電信電話株式会社 NTTオープンソースソフトウェアセンタ Phone: 03-5860-5115 / Fax: 03-5463-5490
2015/03/25 12:59、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: >>> At this moment, I'm not 100% certain about its logic. Especially, I didn't >>> test SEMI- and ANTI- join cases yet. >>> However, time is money - I want people to check overall design first, rather >>> than detailed debugging. Please tell me if I misunderstood the logic to break >>> down join relations. >> >> With applying your patch, regression tests of “updatable view” failed. >> regression.diff contains some errors like this: >> ! ERROR: could not find RelOptInfo for given relids >> >> Could you check that? >> > It is a bug around the logic to find out two RelOptInfo that can construct > another RelOptInfo of joinrel. > Even though I'm now working to correct the logic, it is not obvious to > identify two relids that satisfy joinrel->relids. > (Yep, law of entropy enhancement…) IIUC, this problem is in only non-INNER JOINs because we can treat relations joined with only INNER JOIN in arbitrary order. But supporting OUTER JOINs would be necessary even for the first cut. > On the other hands, we may have a solution that does not need a complicated > reconstruction process. The original concern was, FDW driver may add paths > that will replace entire join subtree by foreign-scan on remote join multiple > times, repeatedly, but these paths shall be identical. > > If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able > to solve the problem more straight-forward and simply way. > Because build_join_rel() finds a cache on root->join_rel_hash then returns > immediately if required joinrelids already has its RelOptInfo, bottom of > this function never called twice on a particular set of joinrelids. > Once FDW/CSP constructs a path that replaces entire join subtree towards > the joinrel just after construction, it shall be kept until cheaper built-in > paths are added (if exists). > > This idea has one other positive side-effect. We expect remote-join is cheaper > than local join with two remote scan in most cases. Once a much cheaper path > is added prior to local join consideration, add_path_precheck() breaks path > consideration earlier. > > Please comment on. Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfofor given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per jointype to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination,and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promisingfor FDW cases. Though I’m not sure that it also fits custom join provider’s requirements. — Shigeru HANADA
On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
--
Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfo for given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join type to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination, and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising for FDW cases.
I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel).
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
> On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote: > Or bottom of make_join_rel(). IMO build_join_rel() is responsible for > just building (or searching from a list) a RelOptInfo for given relids. After > that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per > join type to generate actual Paths implements the join. make_join_rel() is > called only once for particular relid combination, and there SpecialJoinInfo and > restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising > for FDW cases. > > > > I like that idea, but I think we will have complex hook signature, it won't remain > as simple as hook (root, joinrel). > In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2, sjinfo and restrictlist. It is not too simple, but not complicated signature. Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute restrictlist using build_joinrel_restrictlist() again. It is a static function in relnode.c. So, I don't think either of them has definitive advantage from the standpoint of simplicity. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
2015/03/25 19:09、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: >> On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote: >> Or bottom of make_join_rel(). IMO build_join_rel() is responsible for >> just building (or searching from a list) a RelOptInfo for given relids. After >> that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per >> join type to generate actual Paths implements the join. make_join_rel() is >> called only once for particular relid combination, and there SpecialJoinInfo and >> restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising >> for FDW cases. >> >> >> >> I like that idea, but I think we will have complex hook signature, it won't remain >> as simple as hook (root, joinrel). >> > In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2, > sjinfo and restrictlist. > It is not too simple, but not complicated signature. > > Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute > restrictlist using build_joinrel_restrictlist() again. It is a static function > in relnode.c. So, I don't think either of them has definitive advantage from > the standpoint of simplicity. The bottom of make_join_rel() seems good from the viewpoint of information, but it is called multiple times for join combinationswhich are essentially identical, for INNER JOIN case like this: fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid =b.bid and a.bid = t.bid; INFO: postgresGetForeignJoinPaths() 1x2 INFO: postgresGetForeignJoinPaths() 1x4 INFO: postgresGetForeignJoinPaths() 2x4 INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: postgresGetForeignJoinPaths() 0x4 INFO: postgresGetForeignJoinPaths() 0x2 INFO: postgresGetForeignJoinPaths() 0x1 INFO: standard_join_search() old hook point QUERY PLAN ---------------------------------------------------------Foreign Scan (cost=100.00..102.11 rows=211 width=1068) (1 row) Here I’ve put probe point in the beginnig of GetForeignJoinPaths handler and just before set_cheapest() call in standard_join_search()as “old hook point”. In this example 1, 2, and 4 are base relations, and in the join level 3 plannercalls GetForeignJoinPaths() three times for the combinations: 1) (1x2)x4 2) (1x4)x2 3) (2x4)x1 Tom’s suggestion is aiming at providing a chance to consider join push-down in more abstract level, IIUC. So it would begood to call handler only once for that case, for flattened combination (1x2x3). Hum, how about skipping calling handler (or hook) if the joinrel was found by find_join_rel()? At least it suppress redundantcall for different join orders, and handler can determine whether the combination can be flattened by checking thatall RelOptInfo with RELOPT_JOINREL under joinrel has JOIN_INNER as jointype. — Shigeru HANADA
> 2015/03/25 12:59、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > > >>> At this moment, I'm not 100% certain about its logic. Especially, I didn't > >>> test SEMI- and ANTI- join cases yet. > >>> However, time is money - I want people to check overall design first, rather > >>> than detailed debugging. Please tell me if I misunderstood the logic to break > >>> down join relations. > >> > >> With applying your patch, regression tests of “updatable view” failed. > >> regression.diff contains some errors like this: > >> ! ERROR: could not find RelOptInfo for given relids > >> > >> Could you check that? > >> > > It is a bug around the logic to find out two RelOptInfo that can construct > > another RelOptInfo of joinrel. > > Even though I'm now working to correct the logic, it is not obvious to > > identify two relids that satisfy joinrel->relids. > > (Yep, law of entropy enhancement…) > > IIUC, this problem is in only non-INNER JOINs because we can treat relations joined > with only INNER JOIN in arbitrary order. But supporting OUTER JOINs would be > necessary even for the first cut. > Yep. In case when joinrel contains all inner-joined relations managed by same FDW driver, job of get_joinrel_broken_down() is quite simple. However, people want to support outer-join also, doesn't it? > > On the other hands, we may have a solution that does not need a complicated > > reconstruction process. The original concern was, FDW driver may add paths > > that will replace entire join subtree by foreign-scan on remote join multiple > > times, repeatedly, but these paths shall be identical. > > > > If we put a hook for FDW/CSP on bottom of build_join_rel(), we may be able > > to solve the problem more straight-forward and simply way. > > Because build_join_rel() finds a cache on root->join_rel_hash then returns > > immediately if required joinrelids already has its RelOptInfo, bottom of > > this function never called twice on a particular set of joinrelids. > > Once FDW/CSP constructs a path that replaces entire join subtree towards > > the joinrel just after construction, it shall be kept until cheaper built-in > > paths are added (if exists). > > > > This idea has one other positive side-effect. We expect remote-join is cheaper > > than local join with two remote scan in most cases. Once a much cheaper path > > is added prior to local join consideration, add_path_precheck() breaks path > > consideration earlier. > > > > Please comment on. > > Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just > building (or searching from a list) a RelOptInfo for given relids. After that > make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join > type to generate actual Paths implements the join. make_join_rel() is called > only once for particular relid combination, and there SpecialJoinInfo and > restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising > for FDW cases. > As long as caller can know whether build_join_rel() actually construct a new RelOptInfo object, or not, I think it makes more sense than putting a hook within make_join_rel(). > Though I’m not sure that it also fits custom join provider’s requirements. > Join replaced by CSP has two scenarios. First one implements just an alternative logic of built-in join, will takes underlying inner/outer node, so its hook is located on add_paths_to_joinrel() as like built-in join logics. Second one tries to replace entire join sub-tree by materialized view (for example), like FDW remote join cases. So, it has to be hooked nearby the location of GetForeignJoinPaths(). In case of the second scenario, CSP does not have private field in RelOptInfo, so it may not obvious to check whether the given joinrel exactly matches with a particular materialized-view or other caches. At this moment, what I'm interested in is the first scenario, so priority of the second case is not significant for me, at least. Thanks. -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> 2015/03/25 19:09、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > > >> On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> > wrote: > >> Or bottom of make_join_rel(). IMO build_join_rel() is responsible for > >> just building (or searching from a list) a RelOptInfo for given relids. After > >> that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments > per > >> join type to generate actual Paths implements the join. make_join_rel() is > >> called only once for particular relid combination, and there SpecialJoinInfo > and > >> restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising > >> for FDW cases. > >> > >> > >> > >> I like that idea, but I think we will have complex hook signature, it won't > remain > >> as simple as hook (root, joinrel). > >> > > In this case, GetForeignJoinPaths() will take root, joinrel, rel1, rel2, > > sjinfo and restrictlist. > > It is not too simple, but not complicated signature. > > > > Even if we reconstruct rel1 and rel2 using sjinfo, we also need to compute > > restrictlist using build_joinrel_restrictlist() again. It is a static function > > in relnode.c. So, I don't think either of them has definitive advantage from > > the standpoint of simplicity. > > The bottom of make_join_rel() seems good from the viewpoint of information, but > it is called multiple times for join combinations which are essentially identical, > for INNER JOIN case like this: > > fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid > = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid; > INFO: postgresGetForeignJoinPaths() 1x2 > INFO: postgresGetForeignJoinPaths() 1x4 > INFO: postgresGetForeignJoinPaths() 2x4 > INFO: standard_join_search() old hook point > INFO: standard_join_search() old hook point > INFO: standard_join_search() old hook point > INFO: postgresGetForeignJoinPaths() 0x4 > INFO: postgresGetForeignJoinPaths() 0x2 > INFO: postgresGetForeignJoinPaths() 0x1 > INFO: standard_join_search() old hook point > QUERY PLAN > --------------------------------------------------------- > Foreign Scan (cost=100.00..102.11 rows=211 width=1068) > (1 row) > > Here I’ve put probe point in the beginnig of GetForeignJoinPaths handler and just > before set_cheapest() call in standard_join_search() as “old hook point”. In > this example 1, 2, and 4 are base relations, and in the join level 3 planner calls > GetForeignJoinPaths() three times for the combinations: > > 1) (1x2)x4 > 2) (1x4)x2 > 3) (2x4)x1 > > Tom’s suggestion is aiming at providing a chance to consider join push-down in > more abstract level, IIUC. So it would be good to call handler only once for > that case, for flattened combination (1x2x3). > > Hum, how about skipping calling handler (or hook) if the joinrel was found by > find_join_rel()? At least it suppress redundant call for different join orders, > and handler can determine whether the combination can be flattened by checking > that all RelOptInfo with RELOPT_JOINREL under joinrel has JOIN_INNER as jointype. > The reason why FDW handler was called multiple times on your example is, your modified make_join_rel() does not check whether build_join_rel() actually build a new RelOptInfo, or just a cache reference, doesn't it? If so, I'm inclined to your proposition. A new "bool *found" argument of build_join_rel() makes reduce number of FDW handler call, with keeping reasonable information to build remote- join query. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
2015/03/25 18:53、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール: > > > On Wed, Mar 25, 2015 at 3:14 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote: > > Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just building (or searching from a list) a RelOptInfofor given relids. After that make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per jointype to generate actual Paths implements the join. make_join_rel() is called only once for particular relid combination,and there SpecialJoinInfo and restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promisingfor FDW cases. > > I like that idea, but I think we will have complex hook signature, it won't remain as simple as hook (root, joinrel). Signature of the hook (or the FDW API handler) would be like this: typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, RelOptInfo*joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist); This is very similar to add_paths_to_joinrel(), but lacks semifactors and extra_lateral_rels. semifactors can be obtainedwith compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed from root->placeholder_list as add_paths_to_joinrel()does. From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to generate SELECT statement, so it would requiremost work done in make_join_rel again if the signature was hook(root, joinrel). sjinfo will be necessary for supportingSEMI/ANTI joins, but currently it is not in the scope of postgres_fdw. I guess that other FDWs require at least jointype and restrictlist. — Shigeru HANADA
2015/03/25 19:47、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > The reason why FDW handler was called multiple times on your example is, > your modified make_join_rel() does not check whether build_join_rel() > actually build a new RelOptInfo, or just a cache reference, doesn't it? > Yep. After that change calling count looks like this: fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid join pgbench_accounts a on a.bid =b.bid and a.bid = t.bid; INFO: postgresGetForeignJoinPaths() 1x2 INFO: postgresGetForeignJoinPaths() 1x4 INFO: postgresGetForeignJoinPaths() 2x4 INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: standard_join_search() old hook point INFO: postgresGetForeignJoinPaths() 0x4 INFO: standard_join_search() old hook point QUERY PLAN ---------------------------------------------------------Foreign Scan (cost=100.00..102.11 rows=211 width=1068) (1 row) fdw=# > If so, I'm inclined to your proposition. > A new "bool *found" argument of build_join_rel() makes reduce number of > FDW handler call, with keeping reasonable information to build remote- > join query. Another idea is to pass “found” as parameter to FDW handler, and let FDW to decide to skip or not. Some of FDWs (and someof CSP?) might want to be conscious of join combination. — Shigeru HANADA
> 2015/03/25 19:47、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > > The reason why FDW handler was called multiple times on your example is, > > your modified make_join_rel() does not check whether build_join_rel() > > actually build a new RelOptInfo, or just a cache reference, doesn't it? > > > > Yep. After that change calling count looks like this: > > fdw=# explain select * from pgbench_branches b join pgbench_tellers t on t.bid > = b.bid join pgbench_accounts a on a.bid = b.bid and a.bid = t.bid; > INFO: postgresGetForeignJoinPaths() 1x2 > INFO: postgresGetForeignJoinPaths() 1x4 > INFO: postgresGetForeignJoinPaths() 2x4 > INFO: standard_join_search() old hook point > INFO: standard_join_search() old hook point > INFO: standard_join_search() old hook point > INFO: postgresGetForeignJoinPaths() 0x4 > INFO: standard_join_search() old hook point > QUERY PLAN > --------------------------------------------------------- > Foreign Scan (cost=100.00..102.11 rows=211 width=1068) > (1 row) > > fdw=# > > > If so, I'm inclined to your proposition. > > A new "bool *found" argument of build_join_rel() makes reduce number of > > FDW handler call, with keeping reasonable information to build remote- > > join query. > > Another idea is to pass “found” as parameter to FDW handler, and let FDW to decide > to skip or not. Some of FDWs (and some of CSP?) might want to be conscious of > join combination. > I think it does not match the concept we stand on. Unlike CSP, FDW intends to replace an entire join sub-tree that is represented with a particular joinrel, regardless of the sequence to construct a joinrel from multiple baserels. So, it is sufficient to call GetForeignJoinPaths() once a joinrel is constructed, isn't it? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> > Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just > building (or searching from a list) a RelOptInfo for given relids. After that > make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join > type to generate actual Paths implements the join. make_join_rel() is called > only once for particular relid combination, and there SpecialJoinInfo and > restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising > for FDW cases. > > > > I like that idea, but I think we will have complex hook signature, it won't > remain as simple as hook (root, joinrel). > > Signature of the hook (or the FDW API handler) would be like this: > > typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, > RelOptInfo *joinrel, > RelOptInfo *outerrel, > RelOptInfo *innerrel, > JoinType jointype, > SpecialJoinInfo *sjinfo, > List *restrictlist); > > This is very similar to add_paths_to_joinrel(), but lacks semifactors and > extra_lateral_rels. semifactors can be obtained with > compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed > from root->placeholder_list as add_paths_to_joinrel() does. > > From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to > generate SELECT statement, so it would require most work done in make_join_rel > again if the signature was hook(root, joinrel). sjinfo will be necessary for > supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw. > > I guess that other FDWs require at least jointype and restrictlist. > The attached patch adds GetForeignJoinPaths call on make_join_rel() only when 'joinrel' is actually built and both of child relations are managed by same FDW driver, prior to any other built-in join paths. I adjusted the hook definition a little bit, because jointype can be reproduced using SpecialJoinInfo. Right? Probably, it will solve the original concern towards multiple calls of FDW handler in case when it tries to replace an entire join subtree with a foreign- scan on the result of remote join query. How about your opinion? Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2015/03/26 10:51、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > The attached patch adds GetForeignJoinPaths call on make_join_rel() only when > 'joinrel' is actually built and both of child relations are managed by same > FDW driver, prior to any other built-in join paths. > I adjusted the hook definition a little bit, because jointype can be reproduced > using SpecialJoinInfo. Right? OK. > > Probably, it will solve the original concern towards multiple calls of FDW > handler in case when it tries to replace an entire join subtree with a foreign- > scan on the result of remote join query. > > How about your opinion? Seems fine. I’ve fixed my postgres_fdw code to fit the new version, and am working on handling a whole-join-tree. It would be difficult in the 9.5 cycle, but a hook point where we can handle whole joinrel might allow us to optimize a querywhich accesses multiple parent tables, each is inherited by foreign tables and partitioned with identical join key,by building a path tree which joins sharded tables first, and then union those results. -- Shigeru HANADA shigeru.hanada@gmail.com
Attached is the patch which adds join push-down support to postgres_fdw (v7). It supports SELECT statements with JOIN, but has some more possible enhancements (see below). I'd like to share the WIP patch here to get comments about new FDW API design provided by KaiGai-san's v11 patch.
To make reviewing easier, I summarized changes against Custom/Foreign join v11 patch.
Changes for Join push-down support
==================================
- Add FDW API GetForeignJoinPaths(). It generates a ForeignPath which represents a scan against pseudo join relation represented by given RelOptInfo.
- Expand deparsing module to handle multi-relation queries. Steps of deparsing a join query:
1) Optimizer calls postgresGetForeignPaths() for each BASEREL. Here postgres_fdw does the same things as before, except adding column aliases in SELECT clause.
2) Optimizer calls postgresGetForeignJoinPaths() for each JOINREL. Optimizer calls once per RelOptInfo with reloptkind == RELOPT_JOINREL, so postgres_fdw should consider both A JOIN B and B JOIN A in one call.
postgres_fdw checks whether the join can be pushed down.
a) Both outer and inner relations can be pushed down (NULL in RelOptInfo#fdw_private indicates such situation)
b) Outmost command is a SELECT (this can be relaxed in the future)
c) Join type is inner or one of outer
d) Server of all relations in the join are identical
e) Effective user id for all relations in the join are identical (they might be different some were accessed via views)
f) No local filters (this can be relaxed if inner && non-volatile)
g) Join conditions doesn't contain any "unsafe" expression
h) Remote filter doesn't contain any "unsafe" expression
If all criteria passed, postgres_fdw makes ForeignPath for the join and store these information in its fdw_private.
a) ForeignPath of outer relation, first non-parameterized one
b) ForeignPath of outer relation, first non-parameterized one
c) join type (as integer)
d) join conditions (as List of Expr)
e) other conditions (as List of Expr)
As of now the costs of the path is not so accurate, this is a possible enhancement.
2) Optimizer calls postgresGetForeignPlan() for the cheapest topmost Path. If foreign join is the cheapest way to execute the query, optimizer calls postgresGetForeignPlan for the topmost path generated by postgresGetForeignJoinPaths. As Robert and Tom mentioned in the thread, large_table JOIN huge_table might be removed even (large_table JOIN huge_table) JOIN small_table is the cheapest in the join level 3, so postgres_fdw can't assume that paths in lower level survived planning.
To cope with the situation, I'm trying to avoid calling create_plan_recurse() for underlying paths by putting necessary information into PgFdwRelationInfo and link it to appropriate RelOptInfo.
Anyway in current version query string is built up from bottom (BASEREL) to upper recursively. For a join, unerlying outer/inner query are put into FROM clause with wrapping with parenthesis and aliasing. For example:
select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;
is transformed to a query like this:
SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT bid a9, bbalance a10, filler a11 FROM public.pgbench_branches) l (a1, a2, a3) INNER JOIN (SELECT tid a9, bid a10, balance a11, filler a12 FROM public.pgbench_tellers) r (a1, a2, a3, a4) ON ((l.a1 = r.a2));
As in the remote query, column aliasing uses attnum-based numbering with shifted by FirstLowInvalidHeapAttributeNumber to make all attnum positive. For instance, this system uses alias "a9" for the first user column. For readability of code around this, I introduced TO_RELATEVE() macro which converts absolute attnum (-8~) to relative ones (0~). Current deparser can also handle whole-row references (attnum == 0) correctly.
3) Executor calls BeginForeignScan to initialize a scan. Here TupleDesc is taken from the slot, not Relation.
Possible enhancement
====================
- Make deparseSelectSql() more general, thus it can handle both simple SELECT and join SELECT by calling itself recursively. This would avoid assuming that underlying ForeignPath remains in RelOptInfo. (WIP)
- Move appendConditions() calls into deparse.c, to clarify responsibility of modules.
- more accurate estimation
- more detailed information for error location (currently "foreign table" is used as relation name always)
To make reviewing easier, I summarized changes against Custom/Foreign join v11 patch.
Changes for Join push-down support
==================================
- Add FDW API GetForeignJoinPaths(). It generates a ForeignPath which represents a scan against pseudo join relation represented by given RelOptInfo.
- Expand deparsing module to handle multi-relation queries. Steps of deparsing a join query:
1) Optimizer calls postgresGetForeignPaths() for each BASEREL. Here postgres_fdw does the same things as before, except adding column aliases in SELECT clause.
2) Optimizer calls postgresGetForeignJoinPaths() for each JOINREL. Optimizer calls once per RelOptInfo with reloptkind == RELOPT_JOINREL, so postgres_fdw should consider both A JOIN B and B JOIN A in one call.
postgres_fdw checks whether the join can be pushed down.
a) Both outer and inner relations can be pushed down (NULL in RelOptInfo#fdw_private indicates such situation)
b) Outmost command is a SELECT (this can be relaxed in the future)
c) Join type is inner or one of outer
d) Server of all relations in the join are identical
e) Effective user id for all relations in the join are identical (they might be different some were accessed via views)
f) No local filters (this can be relaxed if inner && non-volatile)
g) Join conditions doesn't contain any "unsafe" expression
h) Remote filter doesn't contain any "unsafe" expression
If all criteria passed, postgres_fdw makes ForeignPath for the join and store these information in its fdw_private.
a) ForeignPath of outer relation, first non-parameterized one
b) ForeignPath of outer relation, first non-parameterized one
c) join type (as integer)
d) join conditions (as List of Expr)
e) other conditions (as List of Expr)
As of now the costs of the path is not so accurate, this is a possible enhancement.
2) Optimizer calls postgresGetForeignPlan() for the cheapest topmost Path. If foreign join is the cheapest way to execute the query, optimizer calls postgresGetForeignPlan for the topmost path generated by postgresGetForeignJoinPaths. As Robert and Tom mentioned in the thread, large_table JOIN huge_table might be removed even (large_table JOIN huge_table) JOIN small_table is the cheapest in the join level 3, so postgres_fdw can't assume that paths in lower level survived planning.
To cope with the situation, I'm trying to avoid calling create_plan_recurse() for underlying paths by putting necessary information into PgFdwRelationInfo and link it to appropriate RelOptInfo.
Anyway in current version query string is built up from bottom (BASEREL) to upper recursively. For a join, unerlying outer/inner query are put into FROM clause with wrapping with parenthesis and aliasing. For example:
select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;
is transformed to a query like this:
SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT bid a9, bbalance a10, filler a11 FROM public.pgbench_branches) l (a1, a2, a3) INNER JOIN (SELECT tid a9, bid a10, balance a11, filler a12 FROM public.pgbench_tellers) r (a1, a2, a3, a4) ON ((l.a1 = r.a2));
As in the remote query, column aliasing uses attnum-based numbering with shifted by FirstLowInvalidHeapAttributeNumber to make all attnum positive. For instance, this system uses alias "a9" for the first user column. For readability of code around this, I introduced TO_RELATEVE() macro which converts absolute attnum (-8~) to relative ones (0~). Current deparser can also handle whole-row references (attnum == 0) correctly.
3) Executor calls BeginForeignScan to initialize a scan. Here TupleDesc is taken from the slot, not Relation.
Possible enhancement
====================
- Make deparseSelectSql() more general, thus it can handle both simple SELECT and join SELECT by calling itself recursively. This would avoid assuming that underlying ForeignPath remains in RelOptInfo. (WIP)
- Move appendConditions() calls into deparse.c, to clarify responsibility of modules.
- more accurate estimation
- more detailed information for error location (currently "foreign table" is used as relation name always)
Attachment
Hanada-san, Thanks for your dedicated efforts for remote-join feature. Below are the comments from my side. * Bug - mixture of ctid system column and whole row-reference In case when "ctid" system column is required, deparseSelectSql() adds ctid reference on the base relation scan level. On the other hands, whole-row reference is transformed to a reference to the underlying relation. It will work fine if system column is not specified. However, system column reference breaks tuple layout from the expected one. Eventually it leads an error. postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b; ERROR: malformed record literal: "(2,2,bbb,"(0,2)")" DETAIL: Too many columns. CONTEXT: column "" of foreign table "foreign join" STATEMENT: select ft1.ctid,ft1 from ft1,ft2 where a=b; postgres=# explain verbose select ft1.ctid,ft1 from ft1,ft2 where a=b; QUERY PLAN --------------------------------------------------------------------------------Foreign Scan (cost=200.00..208.35 rows=835width=70) Output: ft1.ctid, ft1.* Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9,aa10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1)) "l" of the first SELECT represents a whole-row reference. However, underlying SELECT contains system columns in its target- list. Is it available to construct such a query? SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ... ^^^^^^^^^^ Probably, it is a job of deparseProjectionSql(). * postgresGetForeignJoinPaths() It walks on the root->simple_rel_array to check whether all the relations involved are manged by same foreign server with same credential. We may have more graceful way for this. Pay attention on the fdw_private field of innerrel/outerrel. If they have a valid fdw_private, it means FDW driver (postgres_fdw) considers all the underlying relations scan/join are available to run the remote-server. So, all we need to check is whether server-id and user-id of both relations are identical or not. * merge_fpinfo() It seems to me fpinfo->rows should be joinrel->rows, and fpinfo->width also should be joinrel->width. No need to have special intelligence here, isn't it? * explain output EXPLAIN output may be a bit insufficient to know what does it actually try to do. postgres=# explain select * from ft1,ft2 where a = b; QUERY PLAN --------------------------------------------------------Foreign Scan (cost=200.00..212.80 rows=1280 width=80) (1 row) Even though it is not an immediate request, it seems to me better to show users joined relations and remote ON/WHERE clause here. Please don't hesitate to consult me, if you have any questions. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru Hanada > Sent: Friday, April 03, 2015 7:32 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown; > pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > Attached is the patch which adds join push-down support to postgres_fdw (v7). > It supports SELECT statements with JOIN, but has some more possible enhancements > (see below). I'd like to share the WIP patch here to get comments about new FDW > API design provided by KaiGai-san's v11 patch. > > To make reviewing easier, I summarized changes against Custom/Foreign join v11 > patch. > > Changes for Join push-down support > ================================== > - Add FDW API GetForeignJoinPaths(). It generates a ForeignPath which represents > a scan against pseudo join relation represented by given RelOptInfo. > - Expand deparsing module to handle multi-relation queries. Steps of deparsing > a join query: > > 1) Optimizer calls postgresGetForeignPaths() for each BASEREL. Here > postgres_fdw does the same things as before, except adding column aliases in SELECT > clause. > 2) Optimizer calls postgresGetForeignJoinPaths() for each JOINREL. Optimizer > calls once per RelOptInfo with reloptkind == RELOPT_JOINREL, so postgres_fdw > should consider both A JOIN B and B JOIN A in one call. > > postgres_fdw checks whether the join can be pushed down. > > a) Both outer and inner relations can be pushed down (NULL in > RelOptInfo#fdw_private indicates such situation) > b) Outmost command is a SELECT (this can be relaxed in the future) > c) Join type is inner or one of outer > d) Server of all relations in the join are identical > e) Effective user id for all relations in the join are identical (they might be > different some were accessed via views) > f) No local filters (this can be relaxed if inner && non-volatile) > g) Join conditions doesn't contain any "unsafe" expression > h) Remote filter doesn't contain any "unsafe" expression > > If all criteria passed, postgres_fdw makes ForeignPath for the join and store > these information in its fdw_private. > > a) ForeignPath of outer relation, first non-parameterized one > b) ForeignPath of outer relation, first non-parameterized one > c) join type (as integer) > d) join conditions (as List of Expr) > e) other conditions (as List of Expr) > > As of now the costs of the path is not so accurate, this is a possible enhancement. > > 2) Optimizer calls postgresGetForeignPlan() for the cheapest topmost Path. If > foreign join is the cheapest way to execute the query, optimizer calls > postgresGetForeignPlan for the topmost path generated by > postgresGetForeignJoinPaths. As Robert and Tom mentioned in the thread, > large_table JOIN huge_table might be removed even (large_table JOIN huge_table) > JOIN small_table is the cheapest in the join level 3, so postgres_fdw can't assume > that paths in lower level survived planning. > > To cope with the situation, I'm trying to avoid calling create_plan_recurse() > for underlying paths by putting necessary information into PgFdwRelationInfo and > link it to appropriate RelOptInfo. > > Anyway in current version query string is built up from bottom (BASEREL) to upper > recursively. For a join, unerlying outer/inner query are put into FROM clause > with wrapping with parenthesis and aliasing. For example: > > select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid; > > is transformed to a query like this: > > SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT bid a9, bbalance > a10, filler a11 FROM public.pgbench_branches) l (a1, a2, a3) INNER JOIN (SELECT > tid a9, bid a10, balance a11, filler a12 FROM public.pgbench_tellers) r (a1, a2, > a3, a4) ON ((l.a1 = r.a2)); > > As in the remote query, column aliasing uses attnum-based numbering with shifted > by FirstLowInvalidHeapAttributeNumber to make all attnum positive. For instance, > this system uses alias "a9" for the first user column. For readability of code > around this, I introduced TO_RELATEVE() macro which converts absolute attnum > (-8~) to relative ones (0~). Current deparser can also handle whole-row > references (attnum == 0) correctly. > > 3) Executor calls BeginForeignScan to initialize a scan. Here TupleDesc is taken > from the slot, not Relation. > > Possible enhancement > ==================== > - Make deparseSelectSql() more general, thus it can handle both simple SELECT > and join SELECT by calling itself recursively. This would avoid assuming that > underlying ForeignPath remains in RelOptInfo. (WIP) > - Move appendConditions() calls into deparse.c, to clarify responsibility of > modules. > - more accurate estimation > - more detailed information for error location (currently "foreign table" is used > as relation name always)
Hi KaiGai-san, Thanks for the review. Attached is the v8 patch of foreign join support for postgres_fdw. In addition to your comments, I removed useless code that retrieves ForeignPath from outer/inner RelOptInfo and store theminto ForeignPath#fdw_private. Now postgres_fdw’s join pushd-down is free from existence of ForeignPath under the joinrelation. This means that we can support the case Robert mentioned before, that whole "(huge JOIN large) JOIN small”can be pushed down even if “(huge JOIN large)” is dominated by another join path. 2015-04-07 13:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Thanks for your dedicated efforts for remote-join feature. > Below are the comments from my side. > > > * Bug - mixture of ctid system column and whole row-reference > In case when "ctid" system column is required, deparseSelectSql() > adds ctid reference on the base relation scan level. > On the other hands, whole-row reference is transformed to > a reference to the underlying relation. It will work fine if > system column is not specified. However, system column reference > breaks tuple layout from the expected one. > Eventually it leads an error. I too found the bug. As you suggested, deparseProjectionSql() is the place to fix. > postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b; > ERROR: malformed record literal: "(2,2,bbb,"(0,2)")" > DETAIL: Too many columns. > CONTEXT: column "" of foreign table "foreign join" > STATEMENT: select ft1.ctid,ft1 from ft1,ft2 where a=b; > > postgres=# explain verbose > select ft1.ctid,ft1 from ft1,ft2 where a=b; > QUERY PLAN > -------------------------------------------------------------------------------- > Foreign Scan (cost=200.00..208.35 rows=835 width=70) > Output: ft1.ctid, ft1.* > Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9, > a a10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT > b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1)) > > "l" of the first SELECT represents a whole-row reference. > However, underlying SELECT contains system columns in its target- > list. > > Is it available to construct such a query? > SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ... > ^^^^^^^^^^ Simple relation reference such as "l" is not sufficient for the purpose, yes. But putting columns in parentheses would notwork when a user column is referenced in original query. I implemented deparseProjectionSql to use ROW(...) expression for a whole-row reference in the target list, in addition toordinary column references for actually used columns and ctid. Please see the test case for mixed use of ctid and whole-row reference to postgres_fdw’s regression tests. Now a whole-rowreference in the remote query looks like this: -- ctid with whole-row reference EXPLAIN (COSTS false, VERBOSE) SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 -> Sort Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 Sort Key: t1.c3, t1.c1 -> Foreign Scan Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14,l.a15, l.a16, l.a17), l.a12, l.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1 (8 rows) In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data, but IMO this would simplify the code fordeparsing. > * postgresGetForeignJoinPaths() > It walks on the root->simple_rel_array to check whether > all the relations involved are manged by same foreign > server with same credential. > We may have more graceful way for this. Pay attention on > the fdw_private field of innerrel/outerrel. If they have > a valid fdw_private, it means FDW driver (postgres_fdw) > considers all the underlying relations scan/join are > available to run the remote-server. > So, all we need to check is whether server-id and user-id > of both relations are identical or not. Exactly. I fixed the code not to loop around. > * merge_fpinfo() > It seems to me fpinfo->rows should be joinrel->rows, and > fpinfo->width also should be joinrel->width. > No need to have special intelligence here, isn't it? Oops. They are vestige of my struggle which disabled SELECT clause optimization (omit unused columns). Now width and rowsare inherited from joinrel. Besides that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simplesummary, not average. > > * explain output > > EXPLAIN output may be a bit insufficient to know what does it > actually try to do. > > postgres=# explain select * from ft1,ft2 where a = b; > QUERY PLAN > -------------------------------------------------------- > Foreign Scan (cost=200.00..212.80 rows=1280 width=80) > (1 row) > > Even though it is not an immediate request, it seems to me > better to show users joined relations and remote ON/WHERE > clause here. > Like this? Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 rows=1280 width=80) … It might produce a very long line in a case of joining many tables because it contains most of remote query other than SELECTclause, but I prefer detailed. Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN items. Note that v8 patch doesn’t contain this change yet! -- Shigeru HANADA
Attachment
Sorry , the document portion was not in the v8 patch. Please use v9 patch instead. 2015-04-07 15:53 GMT+09:00 Shigeru Hanada <shigeru.hanada@gmail.com>: > Hi KaiGai-san, > > Thanks for the review. Attached is the v8 patch of foreign join support for postgres_fdw. > > In addition to your comments, I removed useless code that retrieves ForeignPath from outer/inner RelOptInfo and store theminto ForeignPath#fdw_private. Now postgres_fdw’s join pushd-down is free from existence of ForeignPath under the joinrelation. This means that we can support the case Robert mentioned before, that whole "(huge JOIN large) JOIN small”can be pushed down even if “(huge JOIN large)” is dominated by another join path. > > 2015-04-07 13:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >> Thanks for your dedicated efforts for remote-join feature. >> Below are the comments from my side. >> >> >> * Bug - mixture of ctid system column and whole row-reference >> In case when "ctid" system column is required, deparseSelectSql() >> adds ctid reference on the base relation scan level. >> On the other hands, whole-row reference is transformed to >> a reference to the underlying relation. It will work fine if >> system column is not specified. However, system column reference >> breaks tuple layout from the expected one. >> Eventually it leads an error. > > I too found the bug. As you suggested, deparseProjectionSql() is the place to fix. > >> postgres=# select ft1.ctid,ft1 from ft1,ft2 where a=b; >> ERROR: malformed record literal: "(2,2,bbb,"(0,2)")" >> DETAIL: Too many columns. >> CONTEXT: column "" of foreign table "foreign join" >> STATEMENT: select ft1.ctid,ft1 from ft1,ft2 where a=b; >> >> postgres=# explain verbose >> select ft1.ctid,ft1 from ft1,ft2 where a=b; >> QUERY PLAN >> -------------------------------------------------------------------------------- >> Foreign Scan (cost=200.00..208.35 rows=835 width=70) >> Output: ft1.ctid, ft1.* >> Remote SQL: SELECT l.a1, l.a2 FROM (SELECT l.a7, l, l.a10 FROM (SELECT id a9, >> a a10, atext a11, ctid a7 FROM public.t1) l) l (a1, a2, a3) INNER JOIN (SELECT >> b a10 FROM public.t2) r (a1) ON ((l.a3 = r.a1)) >> >> "l" of the first SELECT represents a whole-row reference. >> However, underlying SELECT contains system columns in its target- >> list. >> >> Is it available to construct such a query? >> SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ... >> ^^^^^^^^^^ > > Simple relation reference such as "l" is not sufficient for the purpose, yes. But putting columns in parentheses wouldnot work when a user column is referenced in original query. > > I implemented deparseProjectionSql to use ROW(...) expression for a whole-row reference in the target list, in additionto ordinary column references for actually used columns and ctid. > > Please see the test case for mixed use of ctid and whole-row reference to postgres_fdw’s regression tests. Now a whole-rowreference in the remote query looks like this: > > -- ctid with whole-row reference > EXPLAIN (COSTS false, VERBOSE) > SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10; > > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > -> Sort > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > Sort Key: t1.c3, t1.c1 > -> Foreign Scan > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, ROW(l.a10, l.a11, l.a12, l.a13, l.a14,l.a15, l.a16, l.a17), l.a12, l.a10 FROM (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1 > (8 rows) > > In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data, but IMO this would simplify the code fordeparsing. > >> * postgresGetForeignJoinPaths() >> It walks on the root->simple_rel_array to check whether >> all the relations involved are manged by same foreign >> server with same credential. >> We may have more graceful way for this. Pay attention on >> the fdw_private field of innerrel/outerrel. If they have >> a valid fdw_private, it means FDW driver (postgres_fdw) >> considers all the underlying relations scan/join are >> available to run the remote-server. >> So, all we need to check is whether server-id and user-id >> of both relations are identical or not. > > Exactly. I fixed the code not to loop around. > >> * merge_fpinfo() >> It seems to me fpinfo->rows should be joinrel->rows, and >> fpinfo->width also should be joinrel->width. >> No need to have special intelligence here, isn't it? > > > Oops. They are vestige of my struggle which disabled SELECT clause optimization (omit unused columns). Now width and rowsare inherited from joinrel. Besides that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simplesummary, not average. > >> >> * explain output >> >> EXPLAIN output may be a bit insufficient to know what does it >> actually try to do. >> >> postgres=# explain select * from ft1,ft2 where a = b; >> QUERY PLAN >> -------------------------------------------------------- >> Foreign Scan (cost=200.00..212.80 rows=1280 width=80) >> (1 row) >> >> Even though it is not an immediate request, it seems to me >> better to show users joined relations and remote ON/WHERE >> clause here. >> > > Like this? > > Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 rows=1280 width=80) > … > > It might produce a very long line in a case of joining many tables because it contains most of remote query other thanSELECT clause, but I prefer detailed. Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAINitems. > > Note that v8 patch doesn’t contain this change yet! > > > -- > Shigeru HANADA > -- Shigeru HANADA
Attachment
Hanada-san, > In addition to your comments, I removed useless code that retrieves ForeignPath > from outer/inner RelOptInfo and store them into ForeignPath#fdw_private. Now > postgres_fdw’s join pushd-down is free from existence of ForeignPath under the > join relation. This means that we can support the case Robert mentioned before, > that whole "(huge JOIN large) JOIN small” can be pushed down even if “(huge JOIN > large)” is dominated by another join path. > Yes, it's definitely reasonable design, and fits intention of the interface. I should point out it from the beginning. :-) > > "l" of the first SELECT represents a whole-row reference. > > However, underlying SELECT contains system columns in its target- > > list. > > > > Is it available to construct such a query? > > SELECT l.a1, l.a2 FROM (SELECT (id,a,atext), ctid) l (a1, a2) ... > > ^^^^^^^^^^ > > Simple relation reference such as "l" is not sufficient for the purpose, yes. > But putting columns in parentheses would not work when a user column is referenced > in original query. > > I implemented deparseProjectionSql to use ROW(...) expression for a whole-row > reference in the target list, in addition to ordinary column references for > actually used columns and ctid. > > Please see the test case for mixed use of ctid and whole-row reference to > postgres_fdw’s regression tests. Now a whole-row reference in the remote query > looks like this: > It seems to me that deparseProjectionSql() works properly. > -- ctid with whole-row reference > EXPLAIN (COSTS false, VERBOSE) > SELECT t1.ctid, t1, t2 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, > t1.c1 OFFSET 100 LIMIT 10; > > ---------------------------------------------------------------------------- > ---------------------------------------------------------------------------- > ------------------------------------------------------------------------- > Limit > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > -> Sort > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > Sort Key: t1.c3, t1.c1 > -> Foreign Scan > Output: t1.ctid, t1.*, t2.*, t1.c3, t1.c1 > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, r.a1 FROM (SELECT l.a7, > ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17), l.a12, l.a10 FROM > (SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a1 > (8 rows) > > In fact l.a12 and l.a10, for t1.c3 and t1.c1, are redundant in transferred data, > but IMO this would simplify the code for deparsing. > I agree. Even if we can reduce networking cost a little, tuple construction takes CPU cycles. Your decision is fair enough for me. > > * merge_fpinfo() > > It seems to me fpinfo->rows should be joinrel->rows, and > > fpinfo->width also should be joinrel->width. > > No need to have special intelligence here, isn't it? > > > Oops. They are vestige of my struggle which disabled SELECT clause optimization > (omit unused columns). Now width and rows are inherited from joinrel. Besides > that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simple > summary, not average. > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote PostgreSQL, doesn't it? postgres_fdw.c:1757 says as follows: /* * Add some additional cost factors to account for connection overhead * (fdw_startup_cost), transferring dataacross the network * (fdw_tuple_cost per retrieved row), and local manipulation of the data * (cpu_tuple_cost perretrieved row). */ If so, does a ForeignScan that involves 100 underlying relation takes 100 times heavy network operations on startup? Probably, no. I think, average is better than sum, and max of them will reflect the cost more correctly. Also, fdw_tuple_cost introduce the cost of data transfer over the network. I thinks, weighted average is the best strategy, like: fpinfo->fdw_tuple_cost = (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width)* fpinfo_o->fdw_tuple_cost + (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_i->fdw_tuple_cost; That's just my suggestion. Please apply the best way you thought. > > * explain output > > > > EXPLAIN output may be a bit insufficient to know what does it > > actually try to do. > > > > postgres=# explain select * from ft1,ft2 where a = b; > > QUERY PLAN > > -------------------------------------------------------- > > Foreign Scan (cost=200.00..212.80 rows=1280 width=80) > > (1 row) > > > > Even though it is not an immediate request, it seems to me > > better to show users joined relations and remote ON/WHERE > > clause here. > > > > Like this? > > Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 > rows=1280 width=80) > … > No. This line is produced by ExplainScanTarget(), so it requires the backend knowledge to individual FDW. Rather than the backend, postgresExplainForeignScan() shall produce the output. > It might produce a very long line in a case of joining many tables because it > contains most of remote query other than SELECT clause, but I prefer detailed. > Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN > items. > It is good, if postgres_fdw can generate relations name involved in the join for each level, and join cond/remote filter individually. > Note that v8 patch doesn’t contain this change yet! > It is a "nice to have" feature. So, I don't think the first commit needs to support this. Just a suggestion in the next step. * implementation suggestion At the deparseJoinSql(), + /* print SELECT clause of the join scan */ + initStringInfo(&selbuf); + i = 0; + foreach(lc, baserel->reltargetlist) + { + Var *var = (Var *) lfirst(lc); + TargetEntry *tle; + + if (i > 0) + appendStringInfoString(&selbuf, ", "); + deparseJoinVar(var, &context); + + tle = makeTargetEntry((Expr *) copyObject(var), + i + 1, pstrdup(""), false); + if (fdw_ps_tlist) + *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle)); + + *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1); + + i++; + } The tle is a copy of the original target-entry, and var-node is also copied one. Why is the tle copied on lappend() again? Also, NULL as acceptable as 3rd argument of makeTargetEntry. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kaigai-san, Thanks for your review. 2015/04/09 10:48、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: * merge_fpinfo() >>> It seems to me fpinfo->rows should be joinrel->rows, and >>> fpinfo->width also should be joinrel->width. >>> No need to have special intelligence here, isn't it? >> >> >> Oops. They are vestige of my struggle which disabled SELECT clause optimization >> (omit unused columns). Now width and rows are inherited from joinrel. Besides >> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use simple >> summary, not average. >> > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote > PostgreSQL, doesn't it? > > postgres_fdw.c:1757 says as follows: > > /* > * Add some additional cost factors to account for connection overhead > * (fdw_startup_cost), transferring data across the network > * (fdw_tuple_cost per retrieved row), and local manipulation of the data > * (cpu_tuple_cost per retrieved row). > */ > > If so, does a ForeignScan that involves 100 underlying relation takes 100 > times heavy network operations on startup? Probably, no. > I think, average is better than sum, and max of them will reflect the cost > more correctly. In my current opinion, no. Though I remember that I've written such comments before :P. Connection establishment occurs only once for the very first access to the server, so in the use cases with long-lived session(via psql, connection pooling, etc.), taking connection overhead into account *every time* seems too pessimistic. Instead, for practical cases, fdw_startup_cost should consider overheads of query construction and getting first responseof it (hopefully it minus retrieving actual data). These overheads are visible in the order of milliseconds. I’mnot sure how much is appropriate for the default, but 100 seems not so bad. Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and it should not be modified according to the widthof the result, so using fpinfo_o->fdw_startup_cost would be ok. > Also, fdw_tuple_cost introduce the cost of data transfer over the network. > I thinks, weighted average is the best strategy, like: > fpinfo->fdw_tuple_cost = > (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_o->fdw_tuple_cost + > (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * fpinfo_i->fdw_tuple_cost; > > That's just my suggestion. Please apply the best way you thought. I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and 2) fdw_tuple_cost never vary in a foreign server. Using fpinfo_o->fdw_tuple_cost (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable. Thoughts? > >>> * explain output >>> >>> EXPLAIN output may be a bit insufficient to know what does it >>> actually try to do. >>> >>> postgres=# explain select * from ft1,ft2 where a = b; >>> QUERY PLAN >>> -------------------------------------------------------- >>> Foreign Scan (cost=200.00..212.80 rows=1280 width=80) >>> (1 row) >>> >>> Even though it is not an immediate request, it seems to me >>> better to show users joined relations and remote ON/WHERE >>> clause here. >>> >> >> Like this? >> >> Foreign Scan on ft1 INNER JOIN ft2 ON ft1.a = ft2.b (cost=200.00..212.80 >> rows=1280 width=80) >> … >> > No. This line is produced by ExplainScanTarget(), so it requires the > backend knowledge to individual FDW. > Rather than the backend, postgresExplainForeignScan() shall produce > the output. Agreed. Additional FDW output such as “Relations”, “Join type”, and “Join conditions” would be possible. > >> It might produce a very long line in a case of joining many tables because it >> contains most of remote query other than SELECT clause, but I prefer detailed. >> Another idea is to print “Join Cond” and “Remote Filter” as separated EXPLAIN >> items. >> > It is good, if postgres_fdw can generate relations name involved in > the join for each level, and join cond/remote filter individually. > >> Note that v8 patch doesn’t contain this change yet! >> > It is a "nice to have" feature. So, I don't think the first commit needs > to support this. Just a suggestion in the next step. > Agreed. > > * implementation suggestion > > At the deparseJoinSql(), > > + /* print SELECT clause of the join scan */ > + initStringInfo(&selbuf); > + i = 0; > + foreach(lc, baserel->reltargetlist) > + { > + Var *var = (Var *) lfirst(lc); > + TargetEntry *tle; > + > + if (i > 0) > + appendStringInfoString(&selbuf, ", "); > + deparseJoinVar(var, &context); > + > + tle = makeTargetEntry((Expr *) copyObject(var), > + i + 1, pstrdup(""), false); > + if (fdw_ps_tlist) > + *fdw_ps_tlist = lappend(*fdw_ps_tlist, copyObject(tle)); > + > + *retrieved_attrs = lappend_int(*retrieved_attrs, i + 1); > + > + i++; > + } > > The tle is a copy of the original target-entry, and var-node is also > copied one. Why is the tle copied on lappend() again? > Also, NULL as acceptable as 3rd argument of makeTargetEntry. Good catch. Fixed. -- Shigeru HANADA shigeru.hanada@gmail.com
Attachment
> 2015/04/09 10:48、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > * merge_fpinfo() > >>> It seems to me fpinfo->rows should be joinrel->rows, and > >>> fpinfo->width also should be joinrel->width. > >>> No need to have special intelligence here, isn't it? > >> > >> > >> Oops. They are vestige of my struggle which disabled SELECT clause optimization > >> (omit unused columns). Now width and rows are inherited from joinrel. > Besides > >> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use > simple > >> summary, not average. > >> > > Does fpinfo->fdw_startup_cost represent a cost to open connection to remote > > PostgreSQL, doesn't it? > > > > postgres_fdw.c:1757 says as follows: > > > > /* > > * Add some additional cost factors to account for connection overhead > > * (fdw_startup_cost), transferring data across the network > > * (fdw_tuple_cost per retrieved row), and local manipulation of the data > > * (cpu_tuple_cost per retrieved row). > > */ > > > > If so, does a ForeignScan that involves 100 underlying relation takes 100 > > times heavy network operations on startup? Probably, no. > > I think, average is better than sum, and max of them will reflect the cost > > more correctly. > > In my current opinion, no. Though I remember that I've written such comments > before :P. > > Connection establishment occurs only once for the very first access to the server, > so in the use cases with long-lived session (via psql, connection pooling, etc.), > taking connection overhead into account *every time* seems too pessimistic. > > Instead, for practical cases, fdw_startup_cost should consider overheads of query > construction and getting first response of it (hopefully it minus retrieving > actual data). These overheads are visible in the order of milliseconds. I’m > not sure how much is appropriate for the default, but 100 seems not so bad. > > Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and it > should not be modified according to the width of the result, so using > fpinfo_o->fdw_startup_cost would be ok. > Indeed, I forgot the connection cache mechanism. As long as we define fdw_startup_cost as you mentioned, it seems to me your logic is heuristically reasonable. > > Also, fdw_tuple_cost introduce the cost of data transfer over the network. > > I thinks, weighted average is the best strategy, like: > > fpinfo->fdw_tuple_cost = > > (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * > fpinfo_o->fdw_tuple_cost + > > (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * > fpinfo_i->fdw_tuple_cost; > > > > That's just my suggestion. Please apply the best way you thought. > > I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and 2) > fdw_tuple_cost never vary in a foreign server. Using fpinfo_o->fdw_tuple_cost > (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable. Thoughts? > OK, you are right. I think it is time to hand over the patch reviewing to committers. So, let me mark it "ready for committers". Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hi Kaigai-san, Thanks for further review, but I found two bugs in v10 patch. I’ve fixed them and wrapped up v11 patch here. * Fix bug about illegal column order Scan against a base relation returns columns in order of column definition, but its target list might require different order. This can be resolved by tuple projection in usual cases, but pushing down joins skips the step, so we need to treatit in remote query. Before this fix, deparseProjectionSql() was called only for queries which have ctid or whole-row reference in its targetlist, but it was a too-much optimization. We always need to call it, because usual column list might require orderingconversion. Checking ordering is not impossible, but it seems useless effort. Another way to resolve this issue is to reorder SELECT clause of a query for base relation if it was a source of a join,but it requires stepping back in planning, so the fix above was chosen. "three tables join" test case is also changed to check this behavior. * Fix bug of duplicate fdw_ps_tlist contents. I coded that deparseSelectSql passes fdw_ps_tlist to deparseSelectSql for underlying RelOptInfo, but it causes redundantentries in fdw_ps_tlist in cases of joining more than two foreign tables. I changed to pass NULL as fdw_ps_tlistfor recursive call of deparseSelectSql. * Fix typos Please review the v11 patch, and mark it as “ready for committer” if it’s ok. In addition to essential features, I tried to implement relation listing in EXPLAIN output. Attached explain_forein_join.patch adds capability to show join combination of a ForeignScan in EXPLAIN output as an additionalitem “Relations”. I thought that using array to list relations is a good way too, but I chose one string valuebecause users would like to know order and type of joins too. 2015/04/09 21:22、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: >> 2015/04/09 10:48、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: >> * merge_fpinfo() >>>>> It seems to me fpinfo->rows should be joinrel->rows, and >>>>> fpinfo->width also should be joinrel->width. >>>>> No need to have special intelligence here, isn't it? >>>> >>>> >>>> Oops. They are vestige of my struggle which disabled SELECT clause optimization >>>> (omit unused columns). Now width and rows are inherited from joinrel. >> Besides >>>> that, fdw_startup_cost and fdw_tuple_cost seem wrong, so I fixed them to use >> simple >>>> summary, not average. >>>> >>> Does fpinfo->fdw_startup_cost represent a cost to open connection to remote >>> PostgreSQL, doesn't it? >>> >>> postgres_fdw.c:1757 says as follows: >>> >>> /* >>> * Add some additional cost factors to account for connection overhead >>> * (fdw_startup_cost), transferring data across the network >>> * (fdw_tuple_cost per retrieved row), and local manipulation of the data >>> * (cpu_tuple_cost per retrieved row). >>> */ >>> >>> If so, does a ForeignScan that involves 100 underlying relation takes 100 >>> times heavy network operations on startup? Probably, no. >>> I think, average is better than sum, and max of them will reflect the cost >>> more correctly. >> >> In my current opinion, no. Though I remember that I've written such comments >> before :P. >> >> Connection establishment occurs only once for the very first access to the server, >> so in the use cases with long-lived session (via psql, connection pooling, etc.), >> taking connection overhead into account *every time* seems too pessimistic. >> >> Instead, for practical cases, fdw_startup_cost should consider overheads of query >> construction and getting first response of it (hopefully it minus retrieving >> actual data). These overheads are visible in the order of milliseconds. I’m >> not sure how much is appropriate for the default, but 100 seems not so bad. >> >> Anyway fdw_startup_cost is per-server setting as same as fdw_tuple_cost, and it >> should not be modified according to the width of the result, so using >> fpinfo_o->fdw_startup_cost would be ok. >> > Indeed, I forgot the connection cache mechanism. As long as we define > fdw_startup_cost as you mentioned, it seems to me your logic is heuristically > reasonable. > >>> Also, fdw_tuple_cost introduce the cost of data transfer over the network. >>> I thinks, weighted average is the best strategy, like: >>> fpinfo->fdw_tuple_cost = >>> (fpinfo_o->width / (fpinfo_o->width + fpinfo_i->width) * >> fpinfo_o->fdw_tuple_cost + >>> (fpinfo_i->width / (fpinfo_o->width + fpinfo_i->width) * >> fpinfo_i->fdw_tuple_cost; >>> >>> That's just my suggestion. Please apply the best way you thought. >> >> I can’t agree that strategy, because 1) width 0 causes per-tuple cost 0, and 2) >> fdw_tuple_cost never vary in a foreign server. Using fpinfo_o->fdw_tuple_cost >> (it must be identical to fpinfo_i->fdw_tuple_cost) seems reasonable. Thoughts? >> > OK, you are right. > > I think it is time to hand over the patch reviewing to committers. > So, let me mark it "ready for committers". > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> -- Shigeru HANADA shigeru.hanada@gmail.com
Attachment
Hanada-san, > Thanks for further review, but I found two bugs in v10 patch. > I’ve fixed them and wrapped up v11 patch here. > > * Fix bug about illegal column order > > Scan against a base relation returns columns in order of column definition, but > its target list might require different order. This can be resolved by tuple > projection in usual cases, but pushing down joins skips the step, so we need to > treat it in remote query. > > Before this fix, deparseProjectionSql() was called only for queries which have > ctid or whole-row reference in its target list, but it was a too-much optimization. > We always need to call it, because usual column list might require ordering > conversion. Checking ordering is not impossible, but it seems useless effort. > > Another way to resolve this issue is to reorder SELECT clause of a query for base > relation if it was a source of a join, but it requires stepping back in planning, > so the fix above was chosen. > > "three tables join" test case is also changed to check this behavior. > Sorry for my oversight. Yep, var-node reference on join relation cannot expect any column orders predefined like as base relations. All reasonable way I know is, relying on the targetlist of the RelOptInfo that contains all the referenced columns in the later stage. > * Fix bug of duplicate fdw_ps_tlist contents. > > I coded that deparseSelectSql passes fdw_ps_tlist to deparseSelectSql for > underlying RelOptInfo, but it causes redundant entries in fdw_ps_tlist in cases > of joining more than two foreign tables. I changed to pass NULL as fdw_ps_tlist > for recursive call of deparseSelectSql. > It's reasonable, and also makes performance benefit because descriptor constructed based on the ps_tlist will match expected result tuple, so it allows to avoid unnecessary projection. > * Fix typos > > Please review the v11 patch, and mark it as “ready for committer” if it’s ok. > It's OK for me, and wants to be reviewed by other people to get it committed. > In addition to essential features, I tried to implement relation listing in EXPLAIN > output. > > Attached explain_forein_join.patch adds capability to show join combination of > a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought > that using array to list relations is a good way too, but I chose one string value > because users would like to know order and type of joins too. > A bit different from my expectation... I expected to display name of the local foreign tables (and its alias), not remote one, because all the local join logic displays local foreign tables name. Is it easy to adjust isn't it? Probably, all you need to do is, putting a local relation name on the text buffer (at deparseSelectSql) instead of the deparsed remote relation. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai-san, 2015/04/14 14:04、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > >> * Fix typos >> >> Please review the v11 patch, and mark it as “ready for committer” if it’s ok. >> > It's OK for me, and wants to be reviewed by other people to get it committed. > Thanks! >> In addition to essential features, I tried to implement relation listing in EXPLAIN >> output. >> >> Attached explain_forein_join.patch adds capability to show join combination of >> a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought >> that using array to list relations is a good way too, but I chose one string value >> because users would like to know order and type of joins too. >> > A bit different from my expectation... I expected to display name of the local > foreign tables (and its alias), not remote one, because all the local join logic > displays local foreign tables name. > Is it easy to adjust isn't it? Probably, all you need to do is, putting a local > relation name on the text buffer (at deparseSelectSql) instead of the deparsed > remote relation. Oops, that’s right. Attached is the revised version. I chose fully qualified name, schema.relname [alias] for the output. It would waste some cycles during planning if that is not for EXPLAIN, but it seems difficult to get a list of nameof relations in ExplainForeignScan() phase, because planning information has gone away at that time. -- Shigeru HANADA shigeru.hanada@gmail.com
Attachment
> >> Attached explain_forein_join.patch adds capability to show join combination > of > >> a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought > >> that using array to list relations is a good way too, but I chose one string > value > >> because users would like to know order and type of joins too. > >> > > A bit different from my expectation... I expected to display name of the local > > foreign tables (and its alias), not remote one, because all the local join logic > > displays local foreign tables name. > > Is it easy to adjust isn't it? Probably, all you need to do is, putting a local > > relation name on the text buffer (at deparseSelectSql) instead of the deparsed > > remote relation. > > Oops, that’s right. Attached is the revised version. I chose fully qualified > name, schema.relname [alias] for the output. It would waste some cycles during > planning if that is not for EXPLAIN, but it seems difficult to get a list of name > of relations in ExplainForeignScan() phase, because planning information has gone > away at that time. > I understand. Private data structure of the postgres_fdw is not designed to keep tree structure data (like relations join tree), so it seems to me a straightforward way to implement the feature. I have a small suggestion. This patch makes deparseSelectSql initialize the StringInfoData if supplied, however, it usually shall be a task of function caller, not callee. In this case, I like to initStringInfo(&relations) next to the line of initStingInfo(&sql) on the postgresGetForeignPlan. In my sense, it is a bit strange to pass uninitialized StringInfoData, to get a text form. @@ -803,7 +806,7 @@ postgresGetForeignPlan(PlannerInfo *root, */ initStringInfo(&sql); deparseSelectSql(&sql, root,baserel, fpinfo->attrs_used, remote_conds, - ¶ms_list, &fdw_ps_tlist, &retrieved_attrs); + ¶ms_list, &fdw_ps_tlist, &retrieved_attrs, &relations); /* * Build the fdw_private list that will be available in the executor. Also, could you merge the EXPLAIN output feature on the main patch? I think here is no reason why to split this feature. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Shigeru HANADA > Sent: Tuesday, April 14, 2015 7:49 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown; > pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > KaiGai-san, > > 2015/04/14 14:04、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > > > >> * Fix typos > >> > >> Please review the v11 patch, and mark it as “ready for committer” if it’s > ok. > >> > > It's OK for me, and wants to be reviewed by other people to get it committed. > > > > Thanks! > > >> In addition to essential features, I tried to implement relation listing in > EXPLAIN > >> output. > >> > >> Attached explain_forein_join.patch adds capability to show join combination > of > >> a ForeignScan in EXPLAIN output as an additional item “Relations”. I thought > >> that using array to list relations is a good way too, but I chose one string > value > >> because users would like to know order and type of joins too. > >> > > A bit different from my expectation... I expected to display name of the local > > foreign tables (and its alias), not remote one, because all the local join logic > > displays local foreign tables name. > > Is it easy to adjust isn't it? Probably, all you need to do is, putting a local > > relation name on the text buffer (at deparseSelectSql) instead of the deparsed > > remote relation. > > Oops, that’s right. Attached is the revised version. I chose fully qualified > name, schema.relname [alias] for the output. It would waste some cycles during > planning if that is not for EXPLAIN, but it seems difficult to get a list of name > of relations in ExplainForeignScan() phase, because planning information has gone > away at that time. > > > > -- > Shigeru HANADA > shigeru.hanada@gmail.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
Kaigai-san, 2015/04/15 22:33、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: >> Oops, that’s right. Attached is the revised version. I chose fully qualified >> name, schema.relname [alias] for the output. It would waste some cycles during >> planning if that is not for EXPLAIN, but it seems difficult to get a list of name >> of relations in ExplainForeignScan() phase, because planning information has gone >> away at that time. >> > I understand. Private data structure of the postgres_fdw is not designed > to keep tree structure data (like relations join tree), so it seems to me > a straightforward way to implement the feature. > > I have a small suggestion. This patch makes deparseSelectSql initialize > the StringInfoData if supplied, however, it usually shall be a task of > function caller, not callee. > In this case, I like to initStringInfo(&relations) next to the line of > initStingInfo(&sql) on the postgresGetForeignPlan. In my sense, it is > a bit strange to pass uninitialized StringInfoData, to get a text form. > > @@ -803,7 +806,7 @@ postgresGetForeignPlan(PlannerInfo *root, > */ > initStringInfo(&sql); > deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, remote_conds, > - ¶ms_list, &fdw_ps_tlist, &retrieved_attrs); > + ¶ms_list, &fdw_ps_tlist, &retrieved_attrs, &relations); > > /* > * Build the fdw_private list that will be available in the executor. > Agreed. If caller passes a buffer, it should be initialized by caller. In addition to your idea, I added a check that theRelOptInfo is a JOINREL, coz BASEREL doesn’t need relations for its EXPLAIN output. > Also, could you merge the EXPLAIN output feature on the main patch? > I think here is no reason why to split this feature. I merged explain patch into foreign_join patch. Now v12 is the latest patch. -- Shigeru HANADA shigeru.hanada@gmail.com
Attachment
Hanada-san, > I merged explain patch into foreign_join patch. > > Now v12 is the latest patch. > It contains many garbage lines... Please ensure the patch is correctly based on the latest master + custom_join patch. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Shigeru HANADA [mailto:shigeru.hanada@gmail.com] > Sent: Thursday, April 16, 2015 5:06 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown; > pgsql-hackers@postgreSQL.org > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom > Plan API) > > Kaigai-san, > > 2015/04/15 22:33、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > >> Oops, that’s right. Attached is the revised version. I chose fully qualified > >> name, schema.relname [alias] for the output. It would waste some cycles during > >> planning if that is not for EXPLAIN, but it seems difficult to get a list of > name > >> of relations in ExplainForeignScan() phase, because planning information has > gone > >> away at that time. > >> > > I understand. Private data structure of the postgres_fdw is not designed > > to keep tree structure data (like relations join tree), so it seems to me > > a straightforward way to implement the feature. > > > > I have a small suggestion. This patch makes deparseSelectSql initialize > > the StringInfoData if supplied, however, it usually shall be a task of > > function caller, not callee. > > In this case, I like to initStringInfo(&relations) next to the line of > > initStingInfo(&sql) on the postgresGetForeignPlan. In my sense, it is > > a bit strange to pass uninitialized StringInfoData, to get a text form. > > > > @@ -803,7 +806,7 @@ postgresGetForeignPlan(PlannerInfo *root, > > */ > > initStringInfo(&sql); > > deparseSelectSql(&sql, root, baserel, fpinfo->attrs_used, remote_conds, > > - ¶ms_list, &fdw_ps_tlist, &retrieved_attrs); > > + ¶ms_list, &fdw_ps_tlist, &retrieved_attrs, > &relations); > > > > /* > > * Build the fdw_private list that will be available in the executor. > > > > Agreed. If caller passes a buffer, it should be initialized by caller. In > addition to your idea, I added a check that the RelOptInfo is a JOINREL, coz BASEREL > doesn’t need relations for its EXPLAIN output. > > > Also, could you merge the EXPLAIN output feature on the main patch? > > I think here is no reason why to split this feature. > > I merged explain patch into foreign_join patch. > > Now v12 is the latest patch. > > -- > Shigeru HANADA > shigeru.hanada@gmail.com > > > >
Kaigai-san, 2015/04/17 10:13、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > Hanada-san, > >> I merged explain patch into foreign_join patch. >> >> Now v12 is the latest patch. >> > It contains many garbage lines... Please ensure the > patch is correctly based on tOhe latest master + > custom_join patch. Oops, sorry. I’ve re-created the patch as v13, based on Custom/Foreign join v11 patch and latest master. It contains EXPLAIN enhancement that new subitem “Relations” shows relations and joins, including order and type, processedby the foreign scan. -- Shigeru HANADA shigeru.hanada@gmail.com
Attachment
Hanada-san, Thanks for your works. I have nothing to comment on any more (at this moment). I hope committer review / comment on the couple of features. Best regards, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Shigeru HANADA [mailto:shigeru.hanada@gmail.com] > Sent: Friday, April 17, 2015 1:44 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Ashutosh Bapat; Robert Haas; Tom Lane; Thom Brown; > pgsql-hackers@postgreSQL.org > Subject: ##freemail## Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom > Plan API) > > Kaigai-san, > > 2015/04/17 10:13、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: > > > Hanada-san, > > > >> I merged explain patch into foreign_join patch. > >> > >> Now v12 is the latest patch. > >> > > It contains many garbage lines... Please ensure the > > patch is correctly based on tOhe latest master + > > custom_join patch. > > Oops, sorry. I’ve re-created the patch as v13, based on Custom/Foreign join v11 > patch and latest master. > > It contains EXPLAIN enhancement that new subitem “Relations” shows relations > and joins, including order and type, processed by the foreign scan. > > -- > Shigeru HANADA > shigeru.hanada@gmail.com > > > >
Kaigai-san, I reviewed the Custom/Foreign join API patch again after writing a patch of join push-down support for postgres_fdw. 2015/03/26 10:51、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール: >>> Or bottom of make_join_rel(). IMO build_join_rel() is responsible for just >> building (or searching from a list) a RelOptInfo for given relids. After that >> make_join_rel() calls add_paths_to_joinrel() with appropriate arguments per join >> type to generate actual Paths implements the join. make_join_rel() is called >> only once for particular relid combination, and there SpecialJoinInfo and >> restrictlist (conditions specified in JOIN-ON and WHERE), so it seems promising >> for FDW cases. >>> >>> I like that idea, but I think we will have complex hook signature, it won't >> remain as simple as hook (root, joinrel). >> >> Signature of the hook (or the FDW API handler) would be like this: >> >> typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, >> RelOptInfo *joinrel, >> RelOptInfo *outerrel, >> RelOptInfo *innerrel, >> JoinType jointype, >> SpecialJoinInfo *sjinfo, >> List *restrictlist); >> >> This is very similar to add_paths_to_joinrel(), but lacks semifactors and >> extra_lateral_rels. semifactors can be obtained with >> compute_semi_anti_join_factors(), and extra_lateral_rels can be constructed >> from root->placeholder_list as add_paths_to_joinrel() does. >> >> From the viewpoint of postgres_fdw, jointype and restrictlist is necessary to >> generate SELECT statement, so it would require most work done in make_join_rel >> again if the signature was hook(root, joinrel). sjinfo will be necessary for >> supporting SEMI/ANTI joins, but currently it is not in the scope of postgres_fdw. >> >> I guess that other FDWs require at least jointype and restrictlist. >> > The attached patch adds GetForeignJoinPaths call on make_join_rel() only when > 'joinrel' is actually built and both of child relations are managed by same > FDW driver, prior to any other built-in join paths. > I adjusted the hook definition a little bit, because jointype can be reproduced > using SpecialJoinInfo. Right? Yes, it can be derived from the expression below: jointype = sjinfo ? sjinfo->jointype : JOIN_INNER; > > Probably, it will solve the original concern towards multiple calls of FDW > handler in case when it tries to replace an entire join subtree with a foreign- > scan on the result of remote join query. > > How about your opinion? AFAIS it’s well-balanced about calling count and available information. New FDW API GetForeignJoinPaths is called only once for a particular combination of join, such as (A join B join C). Beforeconsidering all joins in a join level (number of relations contained in the join tree), possible join combinationsof lower join level are considered recursively. As Tom pointed out before, say imagine a case like ((huge JOINlarge) LEFT JOIN small), expensive path in lower join level might be Here, please let me summarize the changes in the patch as the result of my review. * Add set_join_pathlist_hook_type in add_paths_to_joinrel This hook is intended to provide a chance to add one or more CustomPaths for an actual join combination. If the join isreversible, the hook is called for both A * B and B * A. This is different from FDW API but it seems fine because FDWsshould have chances to process the join in more abstract level than CSPs. Parameters are same as hash_inner_and_outer, so they would be enough for hash-like or nestloop-like methods. I’m not surewhether mergeclause_list is necessary as a parameter or not. It’s information for merge join which is generated whenenable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it for processing a join in its own way? Then it must be in parameter list because select_mergejoin_clauses is static so it’s not accessible from external modules. The timing of the hooking, after considering all built-in path types, seems fine because some of CSPs might want to use built-inpaths as a template or a source. One concern is in the document of the hook function. "Implementing Custom Paths” says: > A custom scan provider will be also able to add paths by setting the following hook, to replace built-in join paths bycustom-scan that performs as if a scan on preliminary joined relations, which us called after the core code has generatedwhat it believes to be the complete and correct set of access paths for the join. I think “replace” would mis-lead readers that CSP can remove or edit existing built-in paths listed in RelOptInfo#pathlistor linked from cheapest_foo. IIUC CSP can just add paths for the join relation, and planner choose itif it’s the cheapest. * Add new FDW API GetForeignJoinPaths in make_join_rel This FDW API is intended to provide a chance to add ForeignPaths for a join relation. This is called only once for a joinrelation, so FDW should consider reversed combination if it’s meaningful in their own mechanisms. Note that this is called only when the join relation was *NOT* found in the PlannerInfo, to avoid redundant calls. Parameters seems enough for postgres_fdw to process N-way join on remote side with pushing down join conditions and remotefilters. * Propagate FDW information through bottom-up planning FDW can handle a join which uses foreign tables managed by the FDW, of course. We obtain FDW routine entry to plan a scanagainst a foreign table, so propagating the information up to join phase would help core planner to check the all sourcesare managed by one FDW or not. It also avoids repeated catalog accesses. * Make create_plan_recurse non-static This is for CSPs and FDWs which want underlying plan nodes of a join. For example, a CSP might want outer/inner plan nodesas input sources of a join. * Treat scanrelid == 0 as pseudo scan A foreign/custom join is represented by a scan against a pseudo relation, i.e. result of a join. Usually Scan has validscanrelid, oid of a relation being scanned, and many functions assume that it’s always valid. The patch adds anothercode paths for scanrelid == 0 as custom/foreign join scans. * Pseudo scan target list support CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for column reference tracking. A scan generatedfor custom/foreign join would have column from multiple relations in its target list, i.e. output columns. Ordinaryscans have all valid columns of the relation as output, so references to them can be resolved easily, but we needan additional mechanism to determine where a reference in a target list of custom/foreign scan come from. This is verysimilar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect reference to another relation’s var. For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan and ForeignScan. For this change, newBitmapSet function bms_shift_members is added. set_deparse_planstate is also changed to pass ps_tlist as namespace for deparsing. These chanes seems reasonable, so I mark this patch as “ready for committers” to hear committers' thoughts. Regards, -- Shigeru HANADA shigeru.hanada@gmail.com
Hanada-san, > I reviewed the Custom/Foreign join API patch again after writing a patch of join > push-down support for postgres_fdw. > Thanks for your dedicated jobs, my comments are inline below. > Here, please let me summarize the changes in the patch as the result of my review. > > * Add set_join_pathlist_hook_type in add_paths_to_joinrel > This hook is intended to provide a chance to add one or more CustomPaths for an > actual join combination. If the join is reversible, the hook is called for both > A * B and B * A. This is different from FDW API but it seems fine because FDWs > should have chances to process the join in more abstract level than CSPs. > > Parameters are same as hash_inner_and_outer, so they would be enough for hash-like > or nestloop-like methods. I’m not sure whether mergeclause_list is necessary > as a parameter or not. It’s information for merge join which is generated when > enable_mergejoin is on and the join is not FULL OUTER. Does some CSP need it > for processing a join in its own way? Then it must be in parameter list because > select_mergejoin_clauses is static so it’s not accessible from external modules. > I think, a preferable way is to reproduce the mergeclause_list by extension itself, rather than pass it as a hook argument, because it is uncertain whether CSP should follow "enable_mergejoin" parameter even if it implements a logic like merge-join. Of course, it needs to expose select_mergejoin_clauses. It seems to me a straight- forward way. > The timing of the hooking, after considering all built-in path types, seems fine > because some of CSPs might want to use built-in paths as a template or a source. > > One concern is in the document of the hook function. "Implementing Custom Paths” > says: > > > A custom scan provider will be also able to add paths by setting the following > hook, to replace built-in join paths by custom-scan that performs as if a scan > on preliminary joined relations, which us called after the core code has generated > what it believes to be the complete and correct set of access paths for the join. > > I think “replace” would mis-lead readers that CSP can remove or edit existing > built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo. IIUC > CSP can just add paths for the join relation, and planner choose it if it’s the > cheapest. > I adjusted the documentation stuff as follows: A custom scan provider will be also able to add paths by setting the following hook, to add <literal>CustomPath</> nodes that perform as if built-in join logic doing. It is typically expected to take two input relations then generate a joined output stream, or just scans preliminaty joined relations like materialized-view. This hook is called next to the consideration of core join logics, then planner will choose the best path to run the relations join in the built-in and custom ones. Probably, it can introduce what this hook works correctly. v12 patch updated only this portion. > * Add new FDW API GetForeignJoinPaths in make_join_rel > This FDW API is intended to provide a chance to add ForeignPaths for a join relation. > This is called only once for a join relation, so FDW should consider reversed > combination if it’s meaningful in their own mechanisms. > > Note that this is called only when the join relation was *NOT* found in the > PlannerInfo, to avoid redundant calls. > Yep, it is designed according to the discussion upthreads. It can produce N-way remote join paths even if intermediate join relation is more expensive than local join + two foreign scan. > Parameters seems enough for postgres_fdw to process N-way join on remote side > with pushing down join conditions and remote filters. > You ensured it clearly. > * Treat scanrelid == 0 as pseudo scan > A foreign/custom join is represented by a scan against a pseudo relation, i.e. > result of a join. Usually Scan has valid scanrelid, oid of a relation being > scanned, and many functions assume that it’s always valid. The patch adds another > code paths for scanrelid == 0 as custom/foreign join scans. > Right, > * Pseudo scan target list support > CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for > column reference tracking. A scan generated for custom/foreign join would have > column from multiple relations in its target list, i.e. output columns. Ordinary > scans have all valid columns of the relation as output, so references to them > can be resolved easily, but we need an additional mechanism to determine where > a reference in a target list of custom/foreign scan come from. This is very > similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect > reference to another relation’s var. > Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core planner which columns of relations are referenced, and which attribute represents what columns/relations. It is an interface contract when foreign/custom-scan is chosen instead of the built-in join logic. > For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan > and ForeignScan. For this change, new BitmapSet function bms_shift_members is > added. > > set_deparse_planstate is also changed to pass ps_tlist as namespace for > deparsing. > Yep, it is same as IndexOnlyScan. > These chanes seems reasonable, so I mark this patch as “ready for committers” > to hear committers' thoughts. > Thanks! -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
I reviewed the foreign_join_v13 patch. Here are my comments
Thanks for this work. It's good to see that the The foreign_join patch includes extensive tests for postgres_fdw. Thanks for the same.
Sanity
---------
The patch foreign_join didn't get applied cleanly with "git apply" but got applied using "patch". The patch has "trailing whitespace"s.
The patch compiles cleanly with pgsql-v9.5-custom-join.v11.patch.
make check in regress and postgres_fdw folders passes without any failures.
Tests
-------
1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test the sanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of enable_hashjoin, enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end. That way, we will also make sure that foreign plans are chosen irrespective of future planner changes.
2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I do not see those being replaced anywhere else.
3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push conditions in ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the safe conditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed down but the unsafe-to-push conditions will be applied locally.
4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates would test the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1).
--
Thanks for this work. It's good to see that the The foreign_join patch includes extensive tests for postgres_fdw. Thanks for the same.
Sanity
---------
The patch foreign_join didn't get applied cleanly with "git apply" but got applied using "patch". The patch has "trailing whitespace"s.
-------
5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are able to push down the largest possible foreign join tree to the foreign server.
Code
-------
-------
In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the callers of classifyConditions() have not changed. Is this change intentional? The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may not have noticed it. Because of this change, we might be missing some optimizations e.g. in function postgresGetForeignPlan()
793 if (list_member_ptr(fpinfo->remote_conds, rinfo))
794 remote_conds = lappend(remote_conds, rinfo->clause);
795 else if (list_member_ptr(fpinfo->local_conds, rinfo))
796 local_exprs = lappend(local_exprs, rinfo->clause);
797 else if (is_foreign_expr(root, baserel, rinfo->clause))
798 remote_conds = lappend(remote_conds, rinfo->clause);
799 else
800 local_exprs = lappend(local_exprs, rinfo->clause);
Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doing an extra call to is_foreign_expr().
793 if (list_member_ptr(fpinfo->remote_conds, rinfo))
794 remote_conds = lappend(remote_conds, rinfo->clause);
795 else if (list_member_ptr(fpinfo->local_conds, rinfo))
796 local_exprs = lappend(local_exprs, rinfo->clause);
797 else if (is_foreign_expr(root, baserel, rinfo->clause))
798 remote_conds = lappend(remote_conds, rinfo->clause);
799 else
800 local_exprs = lappend(local_exprs, rinfo->clause);
Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doing an extra call to is_foreign_expr().
The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw an error, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN.
While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being honored, which means that we will end up locking the rows which are not part of the join result even when the join is pushed to the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests)
contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
LockRows (cost=100.00..124.66 rows=822 width=426)
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
-> Foreign Scan (cost=100.00..116.44 rows=822 width=426)
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*,
ft2.*
Relations: (public.ft1) INNER JOIN (public.ft2)
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, c3 a12
, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, r.a12,
r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6
a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1))
(6 rows)
contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
LockRows (cost=100.00..124.66 rows=822 width=426)
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
-> Foreign Scan (cost=100.00..116.44 rows=822 width=426)
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*,
ft2.*
Relations: (public.ft1) INNER JOIN (public.ft2)
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, c3 a12
, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, r.a12,
r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6
a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1))
(6 rows)
It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to the foreign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As per the postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to the outer query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are part of join. This is going to increase probability of deadlocks, if the join is between a big table and small table where big table is being used in many queries and the join is going to have only a single row in the result.
Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function prologue.
The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like that.
In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals,
3014 if (IS_OUTER_JOIN(jointype))
3015 {
3016 extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
3017 }
3018 else
3019 {
3020 joinclauses = extract_actual_clauses(joinclauses, false);
3021 otherclauses = NIL;
3022 }
3014 if (IS_OUTER_JOIN(jointype))
3015 {
3016 extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
3017 }
3018 else
3019 {
3020 joinclauses = extract_actual_clauses(joinclauses, false);
3021 otherclauses = NIL;
3022 }
we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other clauses and they will be returned as other clauses, which is fine. Also, the following condition
3050 /*
3051 * Other condition for the join must be safe to push down.
3052 */
3053 foreach(lc, otherclauses)
3054 {
3055 Expr *expr = (Expr *) lfirst(lc);
3056
3057 if (!is_foreign_expr(root, joinrel, expr))
3058 {
3059 ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
3060 return;
3061 }
3062 }
is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after obtaining the join result from the foreign server. The join quals are all needed to be safe to push down, since they decide which rows will contain NULL inner side in an OUTER join. 3050 /*
3051 * Other condition for the join must be safe to push down.
3052 */
3053 foreach(lc, otherclauses)
3054 {
3055 Expr *expr = (Expr *) lfirst(lc);
3056
3057 if (!is_foreign_expr(root, joinrel, expr))
3058 {
3059 ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
3060 return;
3061 }
3062 }
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Fri, Apr 17, 2015 at 10:13 AM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
Kaigai-san,
2015/04/17 10:13、Kouhei Kaigai <kaigai@ak.jp.nec.com> のメール:
> Hanada-san,
>
>> I merged explain patch into foreign_join patch.
>>
>> Now v12 is the latest patch.
>>
> It contains many garbage lines... Please ensure the
> patch is correctly based on tOhe latest master +
> custom_join patch.
Oops, sorry. I’ve re-created the patch as v13, based on Custom/Foreign join v11 patch and latest master.
It contains EXPLAIN enhancement that new subitem “Relations” shows relations and joins, including order and type, processed by the foreign scan.
--
Shigeru HANADA
shigeru.hanada@gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Wed, Mar 25, 2015 at 9:51 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch adds GetForeignJoinPaths call on make_join_rel() only when > 'joinrel' is actually built and both of child relations are managed by same > FDW driver, prior to any other built-in join paths. > I adjusted the hook definition a little bit, because jointype can be reproduced > using SpecialJoinInfo. Right? > > Probably, it will solve the original concern towards multiple calls of FDW > handler in case when it tries to replace an entire join subtree with a foreign- > scan on the result of remote join query. > > How about your opinion? A few random cosmetic problems: - The hunk in allpaths.c is useless. - The first hunk in fdwapi.h contains an extra space before the closing parenthesis. And then: + else if (scan->scanrelid == 0 && + (IsA(scan, ForeignScan) || IsA(scan, CustomScan))) + varno = INDEX_VAR; Suppose scan->scanrelid == 0 but the scan type is something else? Is that legal? Is varno == 0 the correct outcome in that case? More later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 21, 2015 at 10:33 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > [ new patch ] A little more nitpicking: ExecInitForeignScan() and ExecInitCustomScan() could declare currentRelation inside the if (scanrelid > 0) block instead of in the outer scope. I'm not too excited about the addition of GetFdwHandlerForRelation, which is a one-line function used in one place. It seems like we don't really need that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 13, 2015 at 8:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I don't object to the concept, but I think that is a pretty bad place >>> to put the hook call: add_paths_to_joinrel is typically called multiple >>> (perhaps *many*) times per joinrel and thus this placement would force >>> any user of the hook to do a lot of repetitive work. > >> Interesting point. I guess the question is whether a some or all >> callers are going to actually *want* a separate call for each >> invocation of add_paths_to_joinrel(), or whether they'll be happy to >> operate on the otherwise-complete path list. > > Hmm. You're right, it's certainly possible that some users would like to > operate on each possible pair of input relations, rather than considering > the joinrel "as a whole". Maybe we need two hooks, one like your patch > and one like I suggested. Let me attempt to summarize subsequent discussion on this thread by saying the hook location that you proposed (just before set_cheapest) has not elicited any enthusiasm from anyone else. In a nutshell, the problem is that a single callback for a large join problem is just fine if there are no special joins involved, but in any other scenario, nobody knows how to use a hook at that location for anything useful. To push down a join to the remote server, you've got to figure out how to emit an SQL query for it. To execute it with a custom join strategy, you've got to know which of those joins should have inner join semantics vs. left join semantics. A hook/callback in make_join_rel() or in add_paths_to_joinrel() makes that relatively straightforward. Otherwise, it's not clear what to do, short of copy-and-pasting join_search_one_level(). If you have a suggestion, I'd like to hear it. If not, I'm going to press forward with the idea of putting the relevant logic in either add_paths_to_joinrel(), as previously proposed, or perhaps up oe level in make_one_rel(). Either way, if you don't need to be called multiple times per joinrel, you can stash a flag inside whatever you hang off of the joinrel's fdw_private and return immediately on every call after the first. I think that's cheap enough that we shouldn't get too stressed about it: for FDWs, we only call the hook at all if everything in the joinrel uses the same FDW, so it won't get called at all except for joinrels where it's likely to win big; for custom joins, multiple calls are quite likely to be useful and necessary, and if the hook burns too much CPU time for the query performance you get out of it, that's the custom-join provider's fault, not ours. The current patch takes this approach one step further and attempts FDW pushdown only once per joinrel. It does that because, while postgres_fdw DOES need the jointype and a valid innerrel/outerrel breakdown to figure out what query to generate, it does NOT every possible breakdown; rather, the first one is as good as any other. But this might not be true for a non-PostgreSQL remote database. So I think it's better to call the hook every time and let the hook return without doing anything if it wants. I'm still not totally sure whether make_one_rel() is better than add_paths_to_joinrel(). The current patch attempts to split the difference by doing FDW pushdown from make_one_rel() and custom joins from add_paths_to_joinrel(). I dunno why; if possible, those two things should happen in the same place. Doing it in make_one_rel() makes for fewer arguments and fewer repetitive calls, but that's not much good if you would have had a use for the extra arguments that aren't computed until we get down to add_paths_to_joinrel(). I'm not sure whether that's the case or not. The latest version of the postgres_fdw patch doesn't seem to mind not having extra_lateral_rels, but I'm wondering if that's busted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Robert, Thanks for your comments. > A few random cosmetic problems: > > - The hunk in allpaths.c is useless. > - The first hunk in fdwapi.h contains an extra space before the > closing parenthesis. > OK, it's my oversight. > And then: > > + else if (scan->scanrelid == 0 && > + (IsA(scan, ForeignScan) || IsA(scan, CustomScan))) > + varno = INDEX_VAR; > > Suppose scan->scanrelid == 0 but the scan type is something else? Is > that legal? Is varno == 0 the correct outcome in that case? > Right now, no other scan type has capability to return a tuples with flexible type/attributes more than static definition. I think it is a valid restriction that only foreign/custom-scan can have scanrelid == 0. I checked overall code again. One point doubtful was ExecScanFetch(). If estate->es_epqTuple is not NULL, it tries to save a tuple from a particular scanrelid (larger than zero). IIUC, es_epqTuple is used only when fetched tuple is updated then visibility checks are applied on writer operation again. So, it should work for CPS with underlying actual scan node on base relations, however, I need code investigation if FDW/CSP replaced an entire join subtree by an alternative relation scan (like a materialized view). > > [ new patch ] > > A little more nitpicking: > > ExecInitForeignScan() and ExecInitCustomScan() could declare > currentRelation inside the if (scanrelid > 0) block instead of in the > outer scope. > OK, > I'm not too excited about the addition of GetFdwHandlerForRelation, > which is a one-line function used in one place. It seems like we > don't really need that. > OK, > On Fri, Mar 13, 2015 at 8:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> I don't object to the concept, but I think that is a pretty bad place > >>> to put the hook call: add_paths_to_joinrel is typically called multiple > >>> (perhaps *many*) times per joinrel and thus this placement would force > >>> any user of the hook to do a lot of repetitive work. > > > >> Interesting point. I guess the question is whether a some or all > >> callers are going to actually *want* a separate call for each > >> invocation of add_paths_to_joinrel(), or whether they'll be happy to > >> operate on the otherwise-complete path list. > > > > Hmm. You're right, it's certainly possible that some users would like to > > operate on each possible pair of input relations, rather than considering > > the joinrel "as a whole". Maybe we need two hooks, one like your patch > > and one like I suggested. > > Let me attempt to summarize subsequent discussion on this thread by > saying the hook location that you proposed (just before set_cheapest) > has not elicited any enthusiasm from anyone else. In a nutshell, the > problem is that a single callback for a large join problem is just > fine if there are no special joins involved, but in any other > scenario, nobody knows how to use a hook at that location for anything > useful. To push down a join to the remote server, you've got to > figure out how to emit an SQL query for it. To execute it with a > custom join strategy, you've got to know which of those joins should > have inner join semantics vs. left join semantics. A hook/callback in > make_join_rel() or in add_paths_to_joinrel() makes that relatively > straightforward. Otherwise, it's not clear what to do, short of > copy-and-pasting join_search_one_level(). If you have a suggestion, > I'd like to hear it. > Nothing I have. Once I tried to put a hook just after the set_cheapest(), the largest problem was that we cannot extract a set of left and right relations from a set of joined relations, like an extraction of apple and orange from mix juice. > If not, I'm going to press forward with the idea of putting the > relevant logic in either add_paths_to_joinrel(), as previously > proposed, or perhaps up oe level in make_one_rel(). Either way, if > you don't need to be called multiple times per joinrel, you can stash > a flag inside whatever you hang off of the joinrel's fdw_private and > return immediately on every call after the first. I think that's > cheap enough that we shouldn't get too stressed about it: for FDWs, we > only call the hook at all if everything in the joinrel uses the same > FDW, so it won't get called at all except for joinrels where it's > likely to win big; for custom joins, multiple calls are quite likely > to be useful and necessary, and if the hook burns too much CPU time > for the query performance you get out of it, that's the custom-join > provider's fault, not ours. The current patch takes this approach one > step further and attempts FDW pushdown only once per joinrel. It does > that because, while postgres_fdw DOES need the jointype and a valid > innerrel/outerrel breakdown to figure out what query to generate, it > does NOT every possible breakdown; rather, the first one is as good as > any other. But this might not be true for a non-PostgreSQL remote > database. So I think it's better to call the hook every time and let > the hook return without doing anything if it wants. > Indeed. Although I and Hanada-san have discussed under an assumption of remote PostgreSQL and join-pushdown cases, we may have remote RDBMS that makes query execution plan according to the order of appear in query. If FDW driver don't want to call GetForeignJoinPaths() multiple times, fdw_private of RelOptInfo is a good marker to determine whether it is the first call or not. In case when multiple CSP will add paths on join, we may need a facility to allow multiple extensions to save its own private data. If we could identify individual CSP by name, it may be an idea to have a hash-table to track private data of CSP. But I don't think it is mandatory feature in the 1st version. > I'm still not totally sure whether make_one_rel() is better than > add_paths_to_joinrel(). The current patch attempts to split the > difference by doing FDW pushdown from make_one_rel() and custom joins > from add_paths_to_joinrel(). I dunno why; if possible, those two > things should happen in the same place. Doing it in make_one_rel() > makes for fewer arguments and fewer repetitive calls, but that's not > much good if you would have had a use for the extra arguments that > aren't computed until we get down to add_paths_to_joinrel(). I'm not > sure whether that's the case or not. The latest version of the > postgres_fdw patch doesn't seem to mind not having extra_lateral_rels, > but I'm wondering if that's busted. > As my initial proposition doing, my preference is add_paths_to_joinrel() for values calculated during this routine (but also increases number of arguments). Even if make_one_rel() called FDW/CSP, I expect extensions have to re-generate these values again, by itself. It is not impossible to implement, not a graceful manner at least. As long as postgres_fdw checks fdw_private of RelOptInfo, amount of code adjustment is not so much. Hanada-san, how about your opinion? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hi Ashutosh, Thanks for the review. 2015/04/22 19:28、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール: > Tests > ------- > 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test thesanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of enable_hashjoin,enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end. Thatway, we will also make sure that foreign plans are chosen irrespective of future planner changes. I have different, rather opposite opinion about it. I disabled other join types as least as the tests pass, because I worryoversights come from planner changes. I hope to eliminate enable_foo from the test script, by improving costing modelsmarter. > 2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I donot see those being replaced anywhere else. It’s accidental removal, I restored the tests about inheritance feature. > 3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push conditionsin ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the safeconditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed downbut the unsafe-to-push conditions will be applied locally. Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit. As you say, in theory,INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can be evaluatedno local side against rows retrieved without those conditions. > 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates wouldtest the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join ft2on (ft1.c1 = ft2.c1). Added an aggregate case, and also added an UNION case for Append. > 5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are ableto push down the largest possible foreign join tree to the foreign server. > > Code > ------- > In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the callersof classifyConditions() have not changed. Is this change intentional? Yes, the purpose of the change is to make appendConditions (former name is appendWhereClause) can handle JOIN ON clause,list of Expr. > The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may nothave noticed it. Because of this change, we might be missing some optimizations e.g. in function postgresGetForeignPlan() > 793 if (list_member_ptr(fpinfo->remote_conds, rinfo)) > 794 remote_conds = lappend(remote_conds, rinfo->clause); > 795 else if (list_member_ptr(fpinfo->local_conds, rinfo)) > 796 local_exprs = lappend(local_exprs, rinfo->clause); > 797 else if (is_foreign_expr(root, baserel, rinfo->clause)) > 798 remote_conds = lappend(remote_conds, rinfo->clause); > 799 else > 800 local_exprs = lappend(local_exprs, rinfo->clause); > Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doingan extra call to is_foreign_expr(). > Hm, it seems better to revert my change and make appendConditions downcast given information into RestrictInfo or Expr accordingto the node tag. > The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw anerror, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN. Agreed, fixed. > While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being honored,which means that we will end up locking the rows which are not part of the join result even when the join is pushedto the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests) > contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1; > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------------- > LockRows (cost=100.00..124.66 rows=822 width=426) > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6,ft2.c7, ft2.c8, ft1.*, ft2.* > -> Foreign Scan (cost=100.00..116.44 rows=822 width=426) > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5,ft2.c6, ft2.c7, ft2.c8, ft1.*, > ft2.* > Relations: (public.ft1) INNER JOIN (public.ft2) > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7,r.a8, r.a9 FROM (SELECT l.a > 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT"C 1" a10, c2 a11, c3 a12 > , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN(SELECT r.a9, r.a10, r.a12, > r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2a10, c3 a12, c4 a13, c5 a14, c6 > a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1)) > (6 rows) > It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to theforeign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As perthe postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to theouter query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are partof join. This is going to increase probability of deadlocks, if the join is between a big table and small table wherebig table is being used in many queries and the join is going to have only a single row in the result. > > Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function prologue. > Oops, replaced with the description of prefix. > The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like that. Fixed. > > In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals, > 3014 if (IS_OUTER_JOIN(jointype)) > 3015 { > 3016 extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses); > 3017 } > 3018 else > 3019 { > 3020 joinclauses = extract_actual_clauses(joinclauses, false); > 3021 otherclauses = NIL; > 3022 } > we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other clausesand they will be returned as other clauses, which is fine. Also, the following condition > 3050 /* > 3051 * Other condition for the join must be safe to push down. > 3052 */ > 3053 foreach(lc, otherclauses) > 3054 { > 3055 Expr *expr = (Expr *) lfirst(lc); > 3056 > 3057 if (!is_foreign_expr(root, joinrel, expr)) > 3058 { > 3059 ereport(DEBUG3, (errmsg("filter contains unsafe conditions"))); > 3060 return; > 3061 } > 3062 } > is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after obtainingthe join result from the foreign server. The join quals are all needed to be safe to push down, since they decidewhich rows will contain NULL inner side in an OUTER join. I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t need* to differentiate if we are talking aboutonly the result of filtering. IMO we *should* differentiate inner and outer (or differentiate join conditions and filter conditions) because all conditionsof typical INNER JOINs go into otherclauses because their is_pushed_down flag is on, so such joins look like CROSSJOIN + WHERE filter. In the latest patch EXPLAIN shows the join combinations of a foreign join scan node with jointype, but your suggestion makes it looks like this: fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------Foreign Scan (cost=100.00..101.00 rows=50 width=716) Output: b.bid, b.bbalance, b.filler, t.tid, t.bid,t.tbalance, t.filler Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t) Remote SQL: SELECTl.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, fillera11 FROM public.pgbench_branches) l)l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid a9,bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, a2, a3, a4) WHERE ((l.a1 = r.a2)) (4 rows) Thoughts? Regards, -- Shigeru HANADA shigeru.hanada@gmail.com
On Wed, Apr 22, 2015 at 10:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> + else if (scan->scanrelid == 0 && >> + (IsA(scan, ForeignScan) || IsA(scan, CustomScan))) >> + varno = INDEX_VAR; >> >> Suppose scan->scanrelid == 0 but the scan type is something else? Is >> that legal? Is varno == 0 the correct outcome in that case? >> > Right now, no other scan type has capability to return a tuples > with flexible type/attributes more than static definition. > I think it is a valid restriction that only foreign/custom-scan > can have scanrelid == 0. But the code as you've written it doesn't enforce any such restriction. It just spends CPU cycles testing for a condition which, to the best of your knowledge, will never happen. If it's really a can't happen condition, how about checking it via an Assert()? else if (scan->scanrelid == 0) { Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan)); varno = INDEX_VAR; } -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Wed, Apr 22, 2015 at 10:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> + else if (scan->scanrelid == 0 && > >> + (IsA(scan, ForeignScan) || IsA(scan, CustomScan))) > >> + varno = INDEX_VAR; > >> > >> Suppose scan->scanrelid == 0 but the scan type is something else? Is > >> that legal? Is varno == 0 the correct outcome in that case? > >> > > Right now, no other scan type has capability to return a tuples > > with flexible type/attributes more than static definition. > > I think it is a valid restriction that only foreign/custom-scan > > can have scanrelid == 0. > > But the code as you've written it doesn't enforce any such > restriction. It just spends CPU cycles testing for a condition which, > to the best of your knowledge, will never happen. > > If it's really a can't happen condition, how about checking it via an Assert()? > > else if (scan->scanrelid == 0) > { > Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan)); > varno = INDEX_VAR; > } > Thanks for your suggestion. I'd like to use this idea on the next patch. -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
The attached patch v13 is revised one according to the suggestion by Robert. - eliminated useless change in allpaths.c - eliminated an extra space in FdwRoutine definition - prohibited to have scanrelid==0 by other than ForeignScan or CustomScan, using Assert() - definition of currentRelation in ExecInitForeignScan() and ExecInitCustomScan() were moved inside of the if-block on scanrelid > 0 - GetForeignJoinPaths() was redefined and moved to add_paths_to_joinrel(), like set_join_pathlist_hook. As suggested, FDW driver can skip to add additional paths if equivalent paths are already added to a certain joinrel by checking fdw_private. So, we can achieve the purpose when we once moved the entrypoint to make_join_rel() - no to populate redundant paths for each potential join combinations, even though remote RDBMS handles it correctly. It also makes sense if remote RDBMS handles tables join according to the order of relations appear. Its definition is below: void GetForeignJoinPaths(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, List *restrictlist, JoinType jointype, SpecialJoinInfo *sjinfo, SemiAntiJoinFactors *semifactors, Relids param_source_rels, Relids extra_lateral_rels); In addition to the arguments in the previous version, we added some parameters computed during add_paths_to_joinrel(). Right now, I'm not certain whether we should include mergeclause_list here, because it depends on enable_mergejoin even though extra join logic based on merge-join may not want to be controlled by this GUC. Hanada-san, could you adjust your postgres_fdw patch according to the above new (previous?) definition. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Friday, April 24, 2015 11:23 PM > To: 'Robert Haas' > Cc: Tom Lane; Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > On Wed, Apr 22, 2015 at 10:48 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > >> + else if (scan->scanrelid == 0 && > > >> + (IsA(scan, ForeignScan) || IsA(scan, > CustomScan))) > > >> + varno = INDEX_VAR; > > >> > > >> Suppose scan->scanrelid == 0 but the scan type is something else? Is > > >> that legal? Is varno == 0 the correct outcome in that case? > > >> > > > Right now, no other scan type has capability to return a tuples > > > with flexible type/attributes more than static definition. > > > I think it is a valid restriction that only foreign/custom-scan > > > can have scanrelid == 0. > > > > But the code as you've written it doesn't enforce any such > > restriction. It just spends CPU cycles testing for a condition which, > > to the best of your knowledge, will never happen. > > > > If it's really a can't happen condition, how about checking it via an Assert()? > > > > else if (scan->scanrelid == 0) > > { > > Assert(IsA(scan, ForeignScan) || IsA(scan, CustomScan)); > > varno = INDEX_VAR; > > } > > > Thanks for your suggestion. I'd like to use this idea on the next patch. > > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Hi Ashutosh, Thanks for the review. 2015/04/22 19:28、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール: > Tests > ------- > 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test thesanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of enable_hashjoin,enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end. Thatway, we will also make sure that foreign plans are chosen irrespective of future planner changes. I have different, rather opposite opinion about it. I disabled other join types as least as the tests pass, because I worryoversights come from planner changes. I hope to eliminate enable_foo from the test script, by improving costing modelsmarter. > 2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I donot see those being replaced anywhere else. It’s accidental removal, I restored the tests about inheritance feature. > 3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push conditionsin ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the safeconditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed downbut the unsafe-to-push conditions will be applied locally. Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit. As you say, in theory,INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can be evaluatedno local side against rows retrieved without those conditions. > 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates wouldtest the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join ft2on (ft1.c1 = ft2.c1). Added an aggregate case, and also added an UNION case for Append. > 5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are ableto push down the largest possible foreign join tree to the foreign server. > > Code > ------- > In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the callersof classifyConditions() have not changed. Is this change intentional? Yes, the purpose of the change is to make appendConditions (former name is appendWhereClause) can handle JOIN ON clause,list of Expr. > The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may nothave noticed it. Because of this change, we might be missing some optimizations e.g. in function postgresGetForeignPlan() > 793 if (list_member_ptr(fpinfo->remote_conds, rinfo)) > 794 remote_conds = lappend(remote_conds, rinfo->clause); > 795 else if (list_member_ptr(fpinfo->local_conds, rinfo)) > 796 local_exprs = lappend(local_exprs, rinfo->clause); > 797 else if (is_foreign_expr(root, baserel, rinfo->clause)) > 798 remote_conds = lappend(remote_conds, rinfo->clause); > 799 else > 800 local_exprs = lappend(local_exprs, rinfo->clause); > Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doingan extra call to is_foreign_expr(). > Hm, it seems better to revert my change and make appendConditions downcast given information into RestrictInfo or Expr accordingto the node tag. > The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw anerror, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN. Agreed, fixed. > While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being honored,which means that we will end up locking the rows which are not part of the join result even when the join is pushedto the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests) > contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1; > > > QUERY PLAN > > > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------------------------- > ---------------------------------------------------------------------------------------------------- > LockRows (cost=100.00..124.66 rows=822 width=426) > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6,ft2.c7, ft2.c8, ft1.*, ft2.* > -> Foreign Scan (cost=100.00..116.44 rows=822 width=426) > Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5,ft2.c6, ft2.c7, ft2.c8, ft1.*, > ft2.* > Relations: (public.ft1) INNER JOIN (public.ft2) > Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7,r.a8, r.a9 FROM (SELECT l.a > 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT"C 1" a10, c2 a11, c3 a12 > , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN(SELECT r.a9, r.a10, r.a12, > r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2a10, c3 a12, c4 a13, c5 a14, c6 > a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1)) > (6 rows) > It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to theforeign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As perthe postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to theouter query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are partof join. This is going to increase probability of deadlocks, if the join is between a big table and small table wherebig table is being used in many queries and the join is going to have only a single row in the result. > > Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function prologue. > Oops, replaced with the description of prefix. > The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like that. Fixed. > > In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals, > 3014 if (IS_OUTER_JOIN(jointype)) > 3015 { > 3016 extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses); > 3017 } > 3018 else > 3019 { > 3020 joinclauses = extract_actual_clauses(joinclauses, false); > 3021 otherclauses = NIL; > 3022 } > we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other clausesand they will be returned as other clauses, which is fine. Also, the following condition > 3050 /* > 3051 * Other condition for the join must be safe to push down. > 3052 */ > 3053 foreach(lc, otherclauses) > 3054 { > 3055 Expr *expr = (Expr *) lfirst(lc); > 3056 > 3057 if (!is_foreign_expr(root, joinrel, expr)) > 3058 { > 3059 ereport(DEBUG3, (errmsg("filter contains unsafe conditions"))); > 3060 return; > 3061 } > 3062 } > is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after obtainingthe join result from the foreign server. The join quals are all needed to be safe to push down, since they decidewhich rows will contain NULL inner side in an OUTER join. I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t need* to differentiate if we are talking aboutonly the result of filtering. IMO we *should* differentiate inner and outer (or differentiate join conditions and filter conditions) because all conditionsof typical INNER JOINs go into otherclauses because their is_pushed_down flag is on, so such joins look like CROSSJOIN + WHERE filter. In the latest patch EXPLAIN shows the join combinations of a foreign join scan node with jointype, but your suggestion makes it looks like this: fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid; WARNING: restrictlist: ({RESTRICTINFO :clause {OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0:inputcollid 0 :args ({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno1 :location 85} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno2 :location 77}) :location -1} :is_pushed_down true :outerjoin_delayed false :can_join true :pseudoconstant false:clause_relids (b 1 2) :required_relids (b 1 2) :outer_relids (b) :nullable_relids (b) :left_relids (b 1) :right_relids(b 2) :orclause <> :norm_selec 0.2000 :outer_selec -1.0000 :mergeopfamilies (o 1976) :left_em {EQUIVALENCEMEMBER:em_expr {VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno1 :location 85} :em_relids (b 1) :em_nullable_relids (b) :em_is_const false :em_is_child false :em_datatype 23}:right_em {EQUIVALENCEMEMBER :em_expr {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0:varnoold 2 :varoattno 2 :location 77} :em_relids (b 2) :em_nullable_relids (b) :em_is_const false :em_is_child false :em_datatype23} :outer_is_left false :hashjoinoperator 96}) WARNING: joinclauses: <> WARNING: otherclauses: ({OPEXPR :opno 96 :opfuncid 65 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 0 :args({VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 1 :varoattno 1 :location85} {VAR :varno 2 :varattno 2 :vartype 23 :vartypmod -1 :varcollid 0 :varlevelsup 0 :varnoold 2 :varoattno 2 :location77}) :location -1}) QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------Foreign Scan (cost=100.00..101.00 rows=50 width=716) Output: b.bid, b.bbalance, b.filler, t.tid, t.bid,t.tbalance, t.filler Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t) Remote SQL: SELECTl.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, fillera11 FROM public.pgbench_branches) l)l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid a9,bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, a2, a3, a4) WHERE ((l.a1 = r.a2)) (4 rows) Thoughts? Regards, -- Shigeru HANADA shigeru.hanada@gmail.com
Kaigai-san, 2015-04-27 11:00 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Hanada-san, could you adjust your postgres_fdw patch according to > the above new (previous?) definition. The attached v14 patch is the revised version for your v13 patch. It also contains changed for Ashutosh’s comments. -- Shigeru HANADA
Attachment
On Fri, Apr 24, 2015 at 3:08 PM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote:
Hi Ashutosh,
Thanks for the review.
2015/04/22 19:28、Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> のメール:
> Tests
> -------
> 1.The postgres_fdw test is re/setting enable_mergejoin at various places. The goal of these tests seems to be to test the sanity of foreign plans generated. So, it might be better to reset enable_mergejoin (and may be all of enable_hashjoin, enable_nestloop_join etc.) to false at the beginning of the testcase and set them again at the end. That way, we will also make sure that foreign plans are chosen irrespective of future planner changes.
I have different, rather opposite opinion about it. I disabled other join types as least as the tests pass, because I worry oversights come from planner changes. I hope to eliminate enable_foo from the test script, by improving costing model smarter.
Ok, if you can do that, that will be excellent.
> 2. In the patch, I see that the inheritance testcases have been deleted from postgres_fdw.sql, is that intentional? I do not see those being replaced anywhere else.
It’s accidental removal, I restored the tests about inheritance feature.
Thanks.
> 3. We need one test for each join type (or at least for INNER and LEFT OUTER) where there are unsafe to push conditions in ON clause along-with safe-to-push conditions. For INNER join, the join should get pushed down with the safe conditions and for OUTER join it shouldn't be. Same goes for WHERE clause, in which case the join will be pushed down but the unsafe-to-push conditions will be applied locally.
Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit. As you say, in theory, INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can be evaluated no local side against rows retrieved without those conditions.
> 4. All the tests have ORDER BY, LIMIT in them, so the setref code is being exercised. But, something like aggregates would test the setref code better. So, we should add at-least one test like select avg(ft1.c1 + ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1).
Added an aggregate case, and also added an UNION case for Append.
Thanks.
> 5. It will be good to add some test which contain join between few foreign and few local tables to see whether we are able to push down the largest possible foreign join tree to the foreign server.
>
Are you planning to do anything on this point?
> Code
> -------
> In classifyConditions(), the code is now appending RestrictInfo::clause rather than RestrictInfo itself. But the callers of classifyConditions() have not changed. Is this change intentional?
Yes, the purpose of the change is to make appendConditions (former name is appendWhereClause) can handle JOIN ON clause, list of Expr.
> The functions which consume the lists produced by this function handle expressions as well RestrictInfo, so you may not have noticed it. Because of this change, we might be missing some optimizations e.g. in function postgresGetForeignPlan()
> 793 if (list_member_ptr(fpinfo->remote_conds, rinfo))
> 794 remote_conds = lappend(remote_conds, rinfo->clause);
> 795 else if (list_member_ptr(fpinfo->local_conds, rinfo))
> 796 local_exprs = lappend(local_exprs, rinfo->clause);
> 797 else if (is_foreign_expr(root, baserel, rinfo->clause))
> 798 remote_conds = lappend(remote_conds, rinfo->clause);
> 799 else
> 800 local_exprs = lappend(local_exprs, rinfo->clause);
> Finding a RestrictInfo in remote_conds avoids another call to is_foreign_expr(). So with this change, I think we are doing an extra call to is_foreign_expr().
>
Hm, it seems better to revert my change and make appendConditions downcast given information into RestrictInfo or Expr according to the node tag.
Thanks.
> The function get_jointype_name() returns an empty string for unsupported join types. Instead of that it should throw an error, if some code path accidentally calls the function with unsupported join type e.g. SEMI_JOIN.
Agreed, fixed.
Thanks.
> While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE clause in the original query is not being honored, which means that we will end up locking the rows which are not part of the join result even when the join is pushed to the foreign server. E.g take the following query (it uses the tables created in postgres_fdw.sql tests)
> contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1 = ft2.c1) for update of ft1;
>
>
> QUERY PLAN
>
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------
> LockRows (cost=100.00..124.66 rows=822 width=426)
> Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
> -> Foreign Scan (cost=100.00..116.44 rows=822 width=426)
> Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*,
> ft2.*
> Relations: (public.ft1) INNER JOIN (public.ft2)
> Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8, l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
> 10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM (SELECT "C 1" a10, c2 a11, c3 a12
> , c4 a13, c5 a14, c6 a15, c7 a16, c8 a17 FROM "S 1"."T 1" FOR UPDATE) l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9) INNER JOIN (SELECT r.a9, r.a10, r.a12,
> r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14, r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14, c6
> a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7, a8, a9) ON ((l.a1 = r.a1))
> (6 rows)
> It's expected that only the rows which are part of join result will be locked by FOR UPDATE clause. The query sent to the foreign server has attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1" on foreign server). As per the postgresql documentation, "When a locking clause appears in a sub-SELECT, the rows locked are those returned to the outer query by the sub-query.". So it's going to lock all rows from "S 1"."T 1", rather than only the rows which are part of join. This is going to increase probability of deadlocks, if the join is between a big table and small table where big table is being used in many queries and the join is going to have only a single row in the result.
>
Are you planning to do anything about this point?
> Since there is no is_first argument to appendConditions(), we should remove corresponding line from the function prologue.
>
Oops, replaced with the description of prefix.
> The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be GET_RELATIVE_ATTNO() or something like that.
Fixed.
Thanks.
I’m not sure that we *shouldn’t* differentiate, but I agree that we *don’t need* to differentiate if we are talking about only the result of filtering.
>
> In postgresGetForeignJoinPaths(), while separating the conditions into join quals and other quals,
> 3014 if (IS_OUTER_JOIN(jointype))
> 3015 {
> 3016 extract_actual_join_clauses(joinclauses, &joinclauses, &otherclauses);
> 3017 }
> 3018 else
> 3019 {
> 3020 joinclauses = extract_actual_clauses(joinclauses, false);
> 3021 otherclauses = NIL;
> 3022 }
> we shouldn't differentiate between outer and inner join. For inner join the join quals can be treated as other clauses and they will be returned as other clauses, which is fine. Also, the following condition
> 3050 /*
> 3051 * Other condition for the join must be safe to push down.
> 3052 */
> 3053 foreach(lc, otherclauses)
> 3054 {
> 3055 Expr *expr = (Expr *) lfirst(lc);
> 3056
> 3057 if (!is_foreign_expr(root, joinrel, expr))
> 3058 {
> 3059 ereport(DEBUG3, (errmsg("filter contains unsafe conditions")));
> 3060 return;
> 3061 }
> 3062 }
> is unnecessary. I there are filter conditions which are unsafe to push down, they can be applied locally after obtaining the join result from the foreign server. The join quals are all needed to be safe to push down, since they decide which rows will contain NULL inner side in an OUTER join.
IMO we *should* differentiate inner and outer (or differentiate join conditions and filter conditions) because all conditions of typical INNER JOINs go into otherclauses because their is_pushed_down flag is on, so such joins look like CROSS JOIN + WHERE filter. In the latest patch EXPLAIN shows the join combinations of a foreign join scan node with join type, but your suggestion makes it looks like this:
fdw=# explain (verbose) select * from pgbench_branches b join pgbench_tellers t on t.bid = b.bid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------
Foreign Scan (cost=100.00..101.00 rows=50 width=716)
Output: b.bid, b.bbalance, b.filler, t.tid, t.bid, t.tbalance, t.filler
Relations: (public.pgbench_branches b) CROSS JOIN (public.pgbench_tellers t)
Remote SQL: SELECT l.a1, l.a2, l.a3, r.a1, r.a2, r.a3, r.a4 FROM (SELECT l.a9, l.a10, l.a11 FROM (SELECT bid a9, bbalance a10, filler a11 FROM public.pgbench_branches) l)
l (a1, a2, a3) CROSS JOIN (SELECT r.a9, r.a10, r.a11, r.a12 FROM (SELECT tid a9, bid a10, tbalance a11, filler a12 FROM public.pgbench_tellers) r) r (a1, a2, a3, a4) WHERE
((l.a1 = r.a2))
(4 rows)
Thoughts?
It does hamper readability a bit. But it explicitly shows, how do we want to treat the join. We can leave this to the committers though.
Regards,
--
Shigeru HANADA
shigeru.hanada@gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
On Mon, Apr 27, 2015 at 5:05 AM, Shigeru HANADA <shigeru.hanada@gmail.com> wrote: > Currently INNER JOINs with unsafe join conditions are not pushed down, so such test is not in the suit. As you say, intheory, INNER JOINs can be pushed down even they have push-down-unsafe join conditions, because such conditions can beevaluated no local side against rows retrieved without those conditions. I suspect it's worth trying to do the pushdown if there is at least one safe joinclause. If there are none, fetching a Cartesian product figures to be a loser. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Apr 26, 2015 at 10:00 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > The attached patch v13 is revised one according to the suggestion > by Robert. Thanks. The last hunk in foreign.c is a useless whitespace change. + /* actually, not shift members */ Change to: "shift of 0 is the same as copying" But actually, do we really need all of this? I think you could reduce the size of this function to three lines of code if you just did this: x = -1; while ((x = bms_next_member(inputset, x)) >= 0) outputset = bms_add_member(inputset, x + shift); It might be very slightly slower, but I think it would be worth it to reduce the amount of code needed. + * 5. Consider paths added by FDW, in case when both of outer and + * inner relations are managed by the same driver. Change to: "If both inner and outer relations are managed by the same FDW, give it a chance to push down joins." + * 6. At the last, consider paths added by extension, in addition to the + * built-in paths. Change to: "Finally, give extensions a chance to manipulate the path list." + * Fetch relation-id, if this foreign-scan node actuall scans on + * a particular real relation. Elsewhere, InvalidOid shall be + * informed to the FDW driver. Change to: "If we're scanning a base relation, look up the OID. (We can skip this if scanning a join relation.)" + * Sanity check. Pseudo scan tuple-descriptor shall be constructed + * based on the fdw_ps_tlist, excluding resjunk=true, so we need to + * ensure all valid TLEs have to locate prior to junk ones. Is the goal here to make attribute numbers match up? If so, between where and where? If not, please explain further. + if (splan->scan.scanrelid == 0) + { ... + } splan->scan.scanrelid += rtoffset; Does this need an "else"? It seems surprising that you would offset scanrelid even if it's starting out as zero. (Note that there are two instances of this pattern.) + * 'found' : indicates whether RelOptInfo is actually constructed. + * true, if it was already built and on the cache. Leftover hunk. Revert this. +typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, Whitespace is wrong, still. + * An optional fdw_ps_tlist is used to map a reference to an attribute of + * underlying relation(s) on a pair of INDEX_VAR and alternative varattno. on -> onto + * It looks like a scan on pseudo relation that is usually result of + * relations join on remote data source, and FDW driver is responsible to + * set expected target list for this. Change to: "When fdw_ps_tlist is used, this represents a remote join, and the FDW driver is responsible for setting this field to an appropriate value." If FDW returns records as foreign- + * table definition, just put NIL here. I think this is just referring to the non-join case; if so, just drop it. Otherwise, I'm confused and need a further explanation. + * Note that since Plan trees can be copied, custom scan providers *must* Extra space before "Note" + Bitmapset *custom_relids; /* set of relid (index of range-tables) + * represented by this node */ Maybe "RTIs this node generates"? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Sun, Apr 26, 2015 at 10:00 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > The attached patch v13 is revised one according to the suggestion > > by Robert. > > Thanks. > > The last hunk in foreign.c is a useless whitespace change. > Sorry, my oversight. > + /* actually, not shift members */ > > Change to: "shift of 0 is the same as copying" > > But actually, do we really need all of this? I think you could reduce > the size of this function to three lines of code if you just did this: > > x = -1; > while ((x = bms_next_member(inputset, x)) >= 0) > outputset = bms_add_member(inputset, x + shift); > > It might be very slightly slower, but I think it would be worth it to > reduce the amount of code needed. > OK, I reverted the bms_shift_members(). It seems to me the code block for T_ForeignScan and T_CustomScan in setrefs.c are a bit large. It may be better to have a separate function like T_IndexOnlyScan. How about your opinion? > + * 5. Consider paths added by FDW, in case when both of outer and > + * inner relations are managed by the same driver. > > Change to: "If both inner and outer relations are managed by the same > FDW, give it a chance to push down joins." > OK, > + * 6. At the last, consider paths added by extension, in addition to the > + * built-in paths. > > Change to: "Finally, give extensions a chance to manipulate the path list." > OK, > + * Fetch relation-id, if this foreign-scan node actuall scans on > + * a particular real relation. Elsewhere, InvalidOid shall be > + * informed to the FDW driver. > > Change to: "If we're scanning a base relation, look up the OID. (We > can skip this if scanning a join relation.)" > OK, > + * Sanity check. Pseudo scan tuple-descriptor shall be constructed > + * based on the fdw_ps_tlist, excluding resjunk=true, so we need to > + * ensure all valid TLEs have to locate prior to junk ones. > > Is the goal here to make attribute numbers match up? If so, between > where and where? If not, please explain further. > No, its purpose is to reduce unnecessary projection. The *_ps_tlist is not only used to construct tuple-descriptor of Foreign/CustomScan with scanrelid==0, but also used to resolve var- nodes with varno==INDEX_VAR in EXPLAIN command. For example, SELECT t1.y, t2.b FROM t1, t2 WHERE t1.x = t2.a; If "t1.x = t2.a" is executable on external computing resource (like remote RDBMS or GPU device, etc), both of t1.x and t2.a don't need to appear on the targetlist of joinrel. In this case, the best *_ps_tlist consists of two var-nodes of t1.x and t2.a because it fits tuple-descriptor of result tuple slot, thus it can skip per-tuple projection. On the other hands, we may want to print out expression clause that shall be executed on the external resource; "t1.x = t2.a" in this case. If FDW/CSP keeps this clause in expression form, its var-nodes shall be rewritten to a pair of INDEX_VAR and resno on *_ps_tlist. So, deparse_expression() needs to be capable to find out "t1.x" and "t2.a" on the *_ps_tlist. However, it does not make sense to include these variables on the scan tuple-descriptor. ExecInitForeignScan() and ExecInitCustomScan() makes its scan tuple- descriptor using ExecCleanTypeFromTL(), not ExecTypeFromTL(), to omit these unreferenced variables on the *_ps_tlist. All the var-nodes with INDEX_VAR shall be identified by offset from head of the list, we cannot allow any target-entry with resjunk=false after ones with resjunk=true, to keep the expected varattno. This sanity checks ensures no target-entry with resjunk=false after the resjunk=true. It helps to distinct attributes to be included in the result tuple from the ones for just reference in EXPLAIN. Did my explain above introduced the reason of this sanity check well? > + if (splan->scan.scanrelid == 0) > + { > ... > + } > splan->scan.scanrelid += rtoffset; > > Does this need an "else"? It seems surprising that you would offset > scanrelid even if it's starting out as zero. > > (Note that there are two instances of this pattern.) > 'break' was put on the tail of if-block, however, it may lead potential bugs in the future. I'll use if-else manner as usual. > + * 'found' : indicates whether RelOptInfo is actually constructed. > + * true, if it was already built and on the cache. > > Leftover hunk. Revert this. > Fixed, > +typedef void (*GetForeignJoinPaths_function ) (PlannerInfo *root, > > Whitespace is wrong, still. > Fixed, > + * An optional fdw_ps_tlist is used to map a reference to an attribute of > + * underlying relation(s) on a pair of INDEX_VAR and alternative varattno. > > on -> onto > OK, > + * It looks like a scan on pseudo relation that is usually result of > + * relations join on remote data source, and FDW driver is responsible to > + * set expected target list for this. > > Change to: "When fdw_ps_tlist is used, this represents a remote join, > and the FDW driver is responsible for setting this field to an > appropriate value." > OK, > If FDW returns records as foreign- > + * table definition, just put NIL here. > > I think this is just referring to the non-join case; if so, just drop > it. Otherwise, I'm confused and need a further explanation. > OK, it is just saying put NIL if non-join case. > + * Note that since Plan trees can be copied, custom scan providers *must* > > Extra space before "Note" > OK, > + Bitmapset *custom_relids; /* set of relid (index of range-tables) > + * > represented by this node */ > > Maybe "RTIs this node generates"? > OK, Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Thu, Apr 30, 2015 at 9:16 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > It seems to me the code block for T_ForeignScan and T_CustomScan in > setrefs.c are a bit large. It may be better to have a separate > function like T_IndexOnlyScan. > How about your opinion? Either way is OK with me. Please do as you think best. >> + * Sanity check. Pseudo scan tuple-descriptor shall be constructed >> + * based on the fdw_ps_tlist, excluding resjunk=true, so we need to >> + * ensure all valid TLEs have to locate prior to junk ones. >> >> Is the goal here to make attribute numbers match up? If so, between >> where and where? If not, please explain further. >> > No, its purpose is to reduce unnecessary projection. > > The *_ps_tlist is not only used to construct tuple-descriptor of > Foreign/CustomScan with scanrelid==0, but also used to resolve var- > nodes with varno==INDEX_VAR in EXPLAIN command. > > For example, > SELECT t1.y, t2.b FROM t1, t2 WHERE t1.x = t2.a; > > If "t1.x = t2.a" is executable on external computing resource (like > remote RDBMS or GPU device, etc), both of t1.x and t2.a don't need > to appear on the targetlist of joinrel. > In this case, the best *_ps_tlist consists of two var-nodes of t1.x > and t2.a because it fits tuple-descriptor of result tuple slot, thus > it can skip per-tuple projection. > > On the other hands, we may want to print out expression clause that > shall be executed on the external resource; "t1.x = t2.a" in this > case. If FDW/CSP keeps this clause in expression form, its var-nodes > shall be rewritten to a pair of INDEX_VAR and resno on *_ps_tlist. > So, deparse_expression() needs to be capable to find out "t1.x" and > "t2.a" on the *_ps_tlist. However, it does not make sense to include > these variables on the scan tuple-descriptor. > > ExecInitForeignScan() and ExecInitCustomScan() makes its scan tuple- > descriptor using ExecCleanTypeFromTL(), not ExecTypeFromTL(), to omit > these unreferenced variables on the *_ps_tlist. All the var-nodes with > INDEX_VAR shall be identified by offset from head of the list, we cannot > allow any target-entry with resjunk=false after ones with resjunk=true, > to keep the expected varattno. > > This sanity checks ensures no target-entry with resjunk=false after > the resjunk=true. It helps to distinct attributes to be included in > the result tuple from the ones for just reference in EXPLAIN. > > Did my explain above introduced the reason of this sanity check well? Yeah, I think so. So what we want to do in this comment is summarize all of that briefly. Maybe something like this: "Sanity check. There may be resjunk entries in fdw_ps_tlist that are included only to help EXPLAIN deparse plans properly. We require that these are at the end, so that when the executor builds the scan descriptor based on the non-junk entries, it gets the attribute numbers correct." >> + if (splan->scan.scanrelid == 0) >> + { >> ... >> + } >> splan->scan.scanrelid += rtoffset; >> >> Does this need an "else"? It seems surprising that you would offset >> scanrelid even if it's starting out as zero. >> >> (Note that there are two instances of this pattern.) >> > 'break' was put on the tail of if-block, however, it may lead potential > bugs in the future. I'll use if-else manner as usual. Ah, OK, I missed that. Yeah, that's probably a good change. I assume you realize you did not attach an updated patch? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Thu, Apr 30, 2015 at 9:16 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > It seems to me the code block for T_ForeignScan and T_CustomScan in > > setrefs.c are a bit large. It may be better to have a separate > > function like T_IndexOnlyScan. > > How about your opinion? > > Either way is OK with me. Please do as you think best. > OK, in setrefs.c, I moved the code block for T_ForeignScan and T_CustomScan into set_foreignscan_references() and set_customscan_references() for each. Its nest-level is a bit deep to keep all the stuff within 80-characters row. It also uses bms_add_member(), instead of bms_shift_members() reverted. > >> + * Sanity check. Pseudo scan tuple-descriptor shall be constructed > >> + * based on the fdw_ps_tlist, excluding resjunk=true, so we need to > >> + * ensure all valid TLEs have to locate prior to junk ones. > >> > >> Is the goal here to make attribute numbers match up? If so, between > >> where and where? If not, please explain further. > >> > > No, its purpose is to reduce unnecessary projection. > > > > The *_ps_tlist is not only used to construct tuple-descriptor of > > Foreign/CustomScan with scanrelid==0, but also used to resolve var- > > nodes with varno==INDEX_VAR in EXPLAIN command. > > > > For example, > > SELECT t1.y, t2.b FROM t1, t2 WHERE t1.x = t2.a; > > > > If "t1.x = t2.a" is executable on external computing resource (like > > remote RDBMS or GPU device, etc), both of t1.x and t2.a don't need > > to appear on the targetlist of joinrel. > > In this case, the best *_ps_tlist consists of two var-nodes of t1.x > > and t2.a because it fits tuple-descriptor of result tuple slot, thus > > it can skip per-tuple projection. > > > > On the other hands, we may want to print out expression clause that > > shall be executed on the external resource; "t1.x = t2.a" in this > > case. If FDW/CSP keeps this clause in expression form, its var-nodes > > shall be rewritten to a pair of INDEX_VAR and resno on *_ps_tlist. > > So, deparse_expression() needs to be capable to find out "t1.x" and > > "t2.a" on the *_ps_tlist. However, it does not make sense to include > > these variables on the scan tuple-descriptor. > > > > ExecInitForeignScan() and ExecInitCustomScan() makes its scan tuple- > > descriptor using ExecCleanTypeFromTL(), not ExecTypeFromTL(), to omit > > these unreferenced variables on the *_ps_tlist. All the var-nodes with > > INDEX_VAR shall be identified by offset from head of the list, we cannot > > allow any target-entry with resjunk=false after ones with resjunk=true, > > to keep the expected varattno. > > > > This sanity checks ensures no target-entry with resjunk=false after > > the resjunk=true. It helps to distinct attributes to be included in > > the result tuple from the ones for just reference in EXPLAIN. > > > > Did my explain above introduced the reason of this sanity check well? > > Yeah, I think so. So what we want to do in this comment is summarize > all of that briefly. Maybe something like this: > > "Sanity check. There may be resjunk entries in fdw_ps_tlist that are > included only to help EXPLAIN deparse plans properly. We require that > these are at the end, so that when the executor builds the scan > descriptor based on the non-junk entries, it gets the attribute > numbers correct." > Thanks, I used this sentence as is. > >> + if (splan->scan.scanrelid == 0) > >> + { > >> ... > >> + } > >> splan->scan.scanrelid += rtoffset; > >> > >> Does this need an "else"? It seems surprising that you would offset > >> scanrelid even if it's starting out as zero. > >> > >> (Note that there are two instances of this pattern.) > >> > > 'break' was put on the tail of if-block, however, it may lead potential > > bugs in the future. I'll use if-else manner as usual. > > Ah, OK, I missed that. Yeah, that's probably a good change. > set_foreignscan_references() and set_customscan_references() are split by two portions using the manner above; a code block if scanrelid==0 and others. > I assume you realize you did not attach an updated patch? > I wanted to submit the v14 after the above items get clarified. The attached patch (v14) includes all what you suggested in the previous message. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
On Thu, Apr 30, 2015 at 5:21 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > I wanted to submit the v14 after the above items get clarified. > The attached patch (v14) includes all what you suggested in the previous > message. Committed, after heavily working over the documentation, and with some more revisions to the comments as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 30, 2015 at 5:21 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> I wanted to submit the v14 after the above items get clarified. >> The attached patch (v14) includes all what you suggested in the previous >> message. > Committed, after heavily working over the documentation, and with some > more revisions to the comments as well. I've been trying to code-review this patch, because the documentation seemed several bricks shy of a load, and I find myself entirely confused by the fdw_ps_tlist and custom_ps_tlist fields. The names, along with some of the comments, imply that these are just targetlists for the join nodes; but if that is the case then we don't need them, because surely scan.targetlist would serve the purpose. There is some other, utterly uncommented, code in setrefs.c and ruleutils.c that suggests these fields are supposed to serve a purpose more like IndexOnlyScan.indextlist; but if that's what they are the comments are woefully inadequate/misleading, and I'm really unsure that the associated code actually works. Also, if that is what they're for (ie, to allow the FDW to redefine the scan tuple contents) it would likely be better to decouple that feature from whether the plan node is for a simple scan or a join. The business about resjunk columns in that list also seems a bit half baked, or at least underdocumented. I do not think that this should have gotten committed without an attendant proof-of-concept patch to postgres_fdw, so that the logic could be tested. regards, tom lane
> Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Apr 30, 2015 at 5:21 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> I wanted to submit the v14 after the above items get clarified. > >> The attached patch (v14) includes all what you suggested in the previous > >> message. > > > Committed, after heavily working over the documentation, and with some > > more revisions to the comments as well. > > I've been trying to code-review this patch, because the documentation > seemed several bricks shy of a load, and I find myself entirely confused > by the fdw_ps_tlist and custom_ps_tlist fields. The names, along with > some of the comments, imply that these are just targetlists for the join > nodes; but if that is the case then we don't need them, because surely > scan.targetlist would serve the purpose. There is some other, utterly > uncommented, code in setrefs.c and ruleutils.c that suggests these fields > are supposed to serve a purpose more like IndexOnlyScan.indextlist; but > if that's what they are the comments are woefully inadequate/misleading, > and I'm really unsure that the associated code actually works. > Main-point of your concern is lack of documentation/comments to introduce how does the pseudo-scan targetlist works here, isn't it?? > Also, > if that is what they're for (ie, to allow the FDW to redefine the scan > tuple contents) it would likely be better to decouple that feature from > whether the plan node is for a simple scan or a join. > In this version, we don't intend FDW/CSP to redefine the contents of scan tuples, even though I want off-loads heavy targetlist calculation workloads to external computing resources in *the future version*. > The business about > resjunk columns in that list also seems a bit half baked, or at least > underdocumented. > I'll add source code comments to introduce how does it works any when does it have resjunk=true. It will be a bit too deep to be introduced in the SGML file. > I do not think that this should have gotten committed without an attendant > proof-of-concept patch to postgres_fdw, so that the logic could be tested. > Hanada-san is now working according to the comments from Robert. Overall design was already discussed in the upthread and the latest implementation follows the people's consensus. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: >> I've been trying to code-review this patch, because the documentation >> seemed several bricks shy of a load, and I find myself entirely confused >> by the fdw_ps_tlist and custom_ps_tlist fields. > Main-point of your concern is lack of documentation/comments to introduce > how does the pseudo-scan targetlist works here, isn't it?? Well, there's a bunch of omissions and outright errors in the docs and comments, but this is the main issue that I was uncertain how to fix from looking at the patch. >> Also, >> if that is what they're for (ie, to allow the FDW to redefine the scan >> tuple contents) it would likely be better to decouple that feature from >> whether the plan node is for a simple scan or a join. > In this version, we don't intend FDW/CSP to redefine the contents of > scan tuples, even though I want off-loads heavy targetlist calculation > workloads to external computing resources in *the future version*. I do not think it's a good idea to introduce such a field now and then redefine how it works and what it's for in a future version. We should not be moving the FDW APIs around more than we absolutely have to, especially not in ways that wouldn't throw an obvious compile error for un-updated code. Also, the longer we wait to make a change that we know we want, the more pain we inflict on FDW authors (simply because there will be more of them a year from now than there are today). >> The business about >> resjunk columns in that list also seems a bit half baked, or at least >> underdocumented. > I'll add source code comments to introduce how does it works any when > does it have resjunk=true. It will be a bit too deep to be introduced > in the SGML file. I don't actually see a reason for resjunk marking in that list at all, if what it's for is to define the contents of the scan tuple. I think we should just s/ExecCleanTypeFromTL/ExecTypeFromTL/ in nodeForeignscan and nodeCustom, and get rid of the "sanity check" in create_foreignscan_plan (which is pretty pointless anyway, considering the number of other ways you could screw up that tlist without it being detected). I'm also inclined to rename the fields to fdw_scan_tlist/custom_scan_tlist, which would better reflect what they do, and to change the API of make_foreignscan() to add a parameter corresponding to the scan tlist. It's utterly bizarre and error-prone that this patch has added a field that the FDW is supposed to set and not changed make_foreignscan to match. >> I do not think that this should have gotten committed without an attendant >> proof-of-concept patch to postgres_fdw, so that the logic could be tested. > Hanada-san is now working according to the comments from Robert. That's nice, but 9.5 feature freeze is only a week away. I don't have a lot of confidence that this stuff is actually in a state where we won't regret shipping it in 9.5. regards, tom lane
On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > That's nice, but 9.5 feature freeze is only a week away. I don't have a > lot of confidence that this stuff is actually in a state where we won't > regret shipping it in 9.5. Yeah. The POC you were asking for upthread certainly exists and has for a while, or I would not have committed this. But I do not think it likely that the postgres_fdw support will be ready for 9.5. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> That's nice, but 9.5 feature freeze is only a week away. I don't have a >> lot of confidence that this stuff is actually in a state where we won't >> regret shipping it in 9.5. > Yeah. The POC you were asking for upthread certainly exists and has > for a while, or I would not have committed this. But I do not think > it likely that the postgres_fdw support will be ready for 9.5. Well, we have two alternatives. I can keep hacking on this and get it to a state where it seems credible to me, but we won't have any proof that it actually works (though perhaps we could treat any problems as bugs that should hopefully get found before 9.5 ships, if a postgres_fdw patch shows up in the next few months). Or we could revert the whole thing and bounce it to the 9.6 cycle. I don't really like doing the latter, but I'm pretty uncomfortable with committing to published FDW APIs that are (a) as messy as this and (b) practically untested. The odds that something slipped through the cracks are high. Aside from the other gripes I raised, I'm exceedingly unhappy with the ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook. It's okay for internal calls in joinpath.c to look like that, but exporting that set of parameters seems like pure folly. We've changed those parameter lists repeatedly (for instance in 9.2 and again in 9.3); the odds that they'll need to change again in future approach 100%. One way we could reduce the risk of code breakage there is to stuff all or most of those parameters into a struct. This might result in a small slowdown for the internal calls, or then again maybe not --- there probably aren't many architectures that can pass 10 parameters in registers anyway. regards, tom lane
... btw, I just noticed something that had escaped me because it seems so obviously wrong that I had not even stopped to consider the possibility that the code was doing what it's doing. To wit, that the planner supposes that two foreign tables are potentially remote-joinable if they share the same underlying FDW handler function. Not the same server, and not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for the handler function. I think this is fundamentally bogus. Under what circumstances are we not just laying off the need to check same server origin onto the FDW? How is it that the urgent need for the FDW to check for that isn't even mentioned in the documentation? I think that we'd really be better off insisting on same server (as in same pg_foreign_server OID), hence automatically same FDW, and what's even more important, same user mapping for any possible query execution context. The possibility that there are some corner cases where some FDWs could optimize other scenarios seems to me to be poor return for the bugs and security holes that will arise any time typical FDWs forget to check this. regards, tom lane
2015-05-09 6:48 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>: > ... btw, I just noticed something that had escaped me because it seems so > obviously wrong that I had not even stopped to consider the possibility > that the code was doing what it's doing. To wit, that the planner > supposes that two foreign tables are potentially remote-joinable if they > share the same underlying FDW handler function. Not the same server, and > not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for > the handler function. I think this is fundamentally bogus. Under what > circumstances are we not just laying off the need to check same server > origin onto the FDW? How is it that the urgent need for the FDW to check > for that isn't even mentioned in the documentation? > Indeed. Comparison of fdw_handler may cause unexpected behavior. I agree it needs to be fixed up. > I think that we'd really be better off insisting on same server (as in > same pg_foreign_server OID), hence automatically same FDW, and what's > even more important, same user mapping for any possible query execution > context. The possibility that there are some corner cases where some FDWs > could optimize other scenarios seems to me to be poor return for the bugs > and security holes that will arise any time typical FDWs forget to check > this. > The former version of foreign/custom-join patch did check for joinable relations using FDW server id, however, it was changed to the current form because it may have additional optimization opportunity - in case when multiple foreign servers have same remote host, access credential and others... Also, I understand your concern about potential security holes by oversight. It is an issue like a weighing scales, however, it seems to me the benefit come from the potential optimization case does not negate the security- hole risk enough. So, I'll make a patch to change the logic to check joinable foreign-tables. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
2015-05-09 2:46 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>: > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: >>> I've been trying to code-review this patch, because the documentation >>> seemed several bricks shy of a load, and I find myself entirely confused >>> by the fdw_ps_tlist and custom_ps_tlist fields. > >> Main-point of your concern is lack of documentation/comments to introduce >> how does the pseudo-scan targetlist works here, isn't it?? > > Well, there's a bunch of omissions and outright errors in the docs and > comments, but this is the main issue that I was uncertain how to fix > from looking at the patch. > >>> Also, >>> if that is what they're for (ie, to allow the FDW to redefine the scan >>> tuple contents) it would likely be better to decouple that feature from >>> whether the plan node is for a simple scan or a join. > >> In this version, we don't intend FDW/CSP to redefine the contents of >> scan tuples, even though I want off-loads heavy targetlist calculation >> workloads to external computing resources in *the future version*. > > I do not think it's a good idea to introduce such a field now and then > redefine how it works and what it's for in a future version. We should > not be moving the FDW APIs around more than we absolutely have to, > especially not in ways that wouldn't throw an obvious compile error > for un-updated code. Also, the longer we wait to make a change that > we know we want, the more pain we inflict on FDW authors (simply because > there will be more of them a year from now than there are today). > Ah, above my sentence don't intend to reuse the existing field for different works in the future version. It's just what I want to support in the future version. Yep, I see. It is not a good idea to redefine the existing field for different purpose silently. It's not my plan. >>> The business about >>> resjunk columns in that list also seems a bit half baked, or at least >>> underdocumented. > >> I'll add source code comments to introduce how does it works any when >> does it have resjunk=true. It will be a bit too deep to be introduced >> in the SGML file. > > I don't actually see a reason for resjunk marking in that list at all, > if what it's for is to define the contents of the scan tuple. I think we > should just s/ExecCleanTypeFromTL/ExecTypeFromTL/ in nodeForeignscan and > nodeCustom, and get rid of the "sanity check" in create_foreignscan_plan > (which is pretty pointless anyway, considering the number of other ways > you could screw up that tlist without it being detected). > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010D7E24@BPXM15GP.gisp.nec.co.jp Does the introduction in above post make sense? The *_ps_tlist is not only used for a basic of scan-tuple descriptor, but also used to solve var-node if varno==INDEX_VAR in EXPLAIN command. On the other hands, existence of the junk entries (which are referenced in external computing resources only) may cause unnecessary projection. So, I want to discriminate target-entries for basis of scan-tuple descriptor from other ones just for EXPLAIN command. > I'm also inclined to rename the fields to > fdw_scan_tlist/custom_scan_tlist, which would better reflect what they do, > and to change the API of make_foreignscan() to add a parameter > corresponding to the scan tlist. It's utterly bizarre and error-prone > that this patch has added a field that the FDW is supposed to set and > not changed make_foreignscan to match. > OK, I'll do the both of changes. The name of ps_tlist is a shorten of "pseudo-scan target-list". So, fdw_scan_tlist/custom_scan_tlist are almost intentional. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
2015-05-09 3:51 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> That's nice, but 9.5 feature freeze is only a week away. I don't have a >>> lot of confidence that this stuff is actually in a state where we won't >>> regret shipping it in 9.5. > >> Yeah. The POC you were asking for upthread certainly exists and has >> for a while, or I would not have committed this. But I do not think >> it likely that the postgres_fdw support will be ready for 9.5. > > Well, we have two alternatives. I can keep hacking on this and get it > to a state where it seems credible to me, but we won't have any proof > that it actually works (though perhaps we could treat any problems > as bugs that should hopefully get found before 9.5 ships, if a > postgres_fdw patch shows up in the next few months). Or we could > revert the whole thing and bounce it to the 9.6 cycle. I don't really > like doing the latter, but I'm pretty uncomfortable with committing to > published FDW APIs that are (a) as messy as this and (b) practically > untested. The odds that something slipped through the cracks are high. > > Aside from the other gripes I raised, I'm exceedingly unhappy with the > ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook. > It's okay for internal calls in joinpath.c to look like that, but > exporting that set of parameters seems like pure folly. We've changed > those parameter lists repeatedly (for instance in 9.2 and again in 9.3); > the odds that they'll need to change again in future approach 100%. > > One way we could reduce the risk of code breakage there is to stuff all > or most of those parameters into a struct. This might result in a small > slowdown for the internal calls, or then again maybe not --- there > probably aren't many architectures that can pass 10 parameters in > registers anyway. > Is it like a following structure definition? typedef struct { PlannerInfo *root; RelOptInfo *joinrel; RelOptInfo *outerrel; RelOptInfo *innerrel; List *restrictlist; JoinType jointype; SpecialJoinInfo *sjinfo; SemiAntiJoinFactors *semifactors; Relids param_source_rels; Relids extra_lateral_rels; } SetJoinPathListArgs; I agree the idea. It also helps CSP driver implementation where it calls next driver that was already chained on its installation. if (set_join_pathlist_next) set_join_pathlist_next(args); is more stable manner than if (set_join_pathlist_next) set_join_pathlist_next(root, joinrel, outerrel, innerrel, restrictlist, jointype, sjinfo, semifactors, param_source_rels, extra_lateral_rels); Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Fri, May 8, 2015 at 2:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Well, we have two alternatives. I can keep hacking on this and get it > to a state where it seems credible to me, but we won't have any proof > that it actually works (though perhaps we could treat any problems > as bugs that should hopefully get found before 9.5 ships, if a > postgres_fdw patch shows up in the next few months). Or we could > revert the whole thing and bounce it to the 9.6 cycle. I don't really > like doing the latter, but I'm pretty uncomfortable with committing to > published FDW APIs that are (a) as messy as this and (b) practically > untested. The odds that something slipped through the cracks are high. A lot of work went into this patch. I think it would be a shame to revert it. I'd even rather ship something imperfect or somewhat unstable and change it later than give up and roll it all back. > Aside from the other gripes I raised, I'm exceedingly unhappy with the > ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook. > It's okay for internal calls in joinpath.c to look like that, but > exporting that set of parameters seems like pure folly. We've changed > those parameter lists repeatedly (for instance in 9.2 and again in 9.3); > the odds that they'll need to change again in future approach 100%. > > One way we could reduce the risk of code breakage there is to stuff all > or most of those parameters into a struct. This might result in a small > slowdown for the internal calls, or then again maybe not --- there > probably aren't many architectures that can pass 10 parameters in > registers anyway. Putting it into a structure certainly seems fine. I think it's pretty silly to assume that the FDW APIs are frozen or we're never going to change them. There was much discussion of the merits of exposing that information or not, and I was (and am) convinced that the FDWs need access to most if not all of that stuff, and that removing access to it will cripple the facility and result in mountains of duplicated and inefficient code. If in the future we compute more or different stuff there, I expect there's a good chance that FDWs will need to be updated to look at that stuff too. Of course, I don't object to maximizing our chances of not needing an API break, but I will be neither surprised nor disappointed if such efforts fail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > ... btw, I just noticed something that had escaped me because it seems so > obviously wrong that I had not even stopped to consider the possibility > that the code was doing what it's doing. To wit, that the planner > supposes that two foreign tables are potentially remote-joinable if they > share the same underlying FDW handler function. Not the same server, and > not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for > the handler function. I think this is fundamentally bogus. Under what > circumstances are we not just laying off the need to check same server > origin onto the FDW? How is it that the urgent need for the FDW to check > for that isn't even mentioned in the documentation? > > I think that we'd really be better off insisting on same server (as in > same pg_foreign_server OID), hence automatically same FDW, and what's > even more important, same user mapping for any possible query execution > context. The possibility that there are some corner cases where some FDWs > could optimize other scenarios seems to me to be poor return for the bugs > and security holes that will arise any time typical FDWs forget to check > this. I originally wanted to go quite the other way with this and check for join pushdown via handler X any time at least one of the two relations involved used handler X, even if the other one used some other handler or was a plain table. In particular, it seems to me quite plausible to want to teach an FDW that a certain local table is replicated on a remote node, allowing a join between a foreign table and a plain table to be pushed down. This infrastructure can't be used that way anyhow, so maybe there's no harm in tightening it up, but I'm wary of circumscribing what FDW authors can do. I think it's better to be rather expansive in terms of when we call them and let them return without doing anything some of them time than to define the situations in which we call them too narrowly and end up ruling out interesting use cases. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2015-05-09 11:21 GMT+09:00 Robert Haas <robertmhaas@gmail.com>: > On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... btw, I just noticed something that had escaped me because it seems so >> obviously wrong that I had not even stopped to consider the possibility >> that the code was doing what it's doing. To wit, that the planner >> supposes that two foreign tables are potentially remote-joinable if they >> share the same underlying FDW handler function. Not the same server, and >> not even the same pg_foreign_data_wrapper entry, but the pg_proc entry for >> the handler function. I think this is fundamentally bogus. Under what >> circumstances are we not just laying off the need to check same server >> origin onto the FDW? How is it that the urgent need for the FDW to check >> for that isn't even mentioned in the documentation? >> >> I think that we'd really be better off insisting on same server (as in >> same pg_foreign_server OID), hence automatically same FDW, and what's >> even more important, same user mapping for any possible query execution >> context. The possibility that there are some corner cases where some FDWs >> could optimize other scenarios seems to me to be poor return for the bugs >> and security holes that will arise any time typical FDWs forget to check >> this. > > I originally wanted to go quite the other way with this and check for > join pushdown via handler X any time at least one of the two relations > involved used handler X, even if the other one used some other handler > or was a plain table. In particular, it seems to me quite plausible > to want to teach an FDW that a certain local table is replicated on a > remote node, allowing a join between a foreign table and a plain table > to be pushed down. This infrastructure can't be used that way anyhow, > so maybe there's no harm in tightening it up, but I'm wary of > circumscribing what FDW authors can do. I think it's better to be > rather expansive in terms of when we call them and let them return > without doing anything some of them time than to define the situations > in which we call them too narrowly and end up ruling out interesting > use cases. > Probably, it is relatively minor case to join a foreign table and a replicated local relation on remote side. Even if the rough check by sameness of foreign server-id does not invoke GetForeignJoinPaths, FDW driver can implement its arbitrary logic using set_join_pathlist_hook by its own risk, isn't it? The attached patch changed the logic to check joinability of two foreign relations. As upthread, it checks foreign server-id instead of handler function. build_join_rel() set fdw_server of RelOptInfo if inner and outer foreign- relations have same value, then it eventually allows to kick GetForeignJoinPaths on add_paths_to_joinrel(). Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
2015-05-09 8:32 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2015-05-09 3:51 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Fri, May 8, 2015 at 1:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> That's nice, but 9.5 feature freeze is only a week away. I don't have a >>>> lot of confidence that this stuff is actually in a state where we won't >>>> regret shipping it in 9.5. >> >>> Yeah. The POC you were asking for upthread certainly exists and has >>> for a while, or I would not have committed this. But I do not think >>> it likely that the postgres_fdw support will be ready for 9.5. >> >> Well, we have two alternatives. I can keep hacking on this and get it >> to a state where it seems credible to me, but we won't have any proof >> that it actually works (though perhaps we could treat any problems >> as bugs that should hopefully get found before 9.5 ships, if a >> postgres_fdw patch shows up in the next few months). Or we could >> revert the whole thing and bounce it to the 9.6 cycle. I don't really >> like doing the latter, but I'm pretty uncomfortable with committing to >> published FDW APIs that are (a) as messy as this and (b) practically >> untested. The odds that something slipped through the cracks are high. >> >> Aside from the other gripes I raised, I'm exceedingly unhappy with the >> ad-hoc APIs proposed for GetForeignJoinPaths and set_join_pathlist_hook. >> It's okay for internal calls in joinpath.c to look like that, but >> exporting that set of parameters seems like pure folly. We've changed >> those parameter lists repeatedly (for instance in 9.2 and again in 9.3); >> the odds that they'll need to change again in future approach 100%. >> >> One way we could reduce the risk of code breakage there is to stuff all >> or most of those parameters into a struct. This might result in a small >> slowdown for the internal calls, or then again maybe not --- there >> probably aren't many architectures that can pass 10 parameters in >> registers anyway. >> > Is it like a following structure definition? > > typedef struct > { > PlannerInfo *root; > RelOptInfo *joinrel; > RelOptInfo *outerrel; > RelOptInfo *innerrel; > List *restrictlist; > JoinType jointype; > SpecialJoinInfo *sjinfo; > SemiAntiJoinFactors *semifactors; > Relids param_source_rels; > Relids extra_lateral_rels; > } SetJoinPathListArgs; > > I agree the idea. It also helps CSP driver implementation where it calls > next driver that was already chained on its installation. > > if (set_join_pathlist_next) > set_join_pathlist_next(args); > > is more stable manner than > > if (set_join_pathlist_next) > set_join_pathlist_next(root, > joinrel, > outerrel, > innerrel, > restrictlist, > jointype, > sjinfo, > semifactors, > param_source_rels, > extra_lateral_rels); > The attached patch newly defines ExtraJoinPathArgs struct to pack all the necessary information to be delivered on GetForeignJoinPaths and set_join_pathlist_hook. Its definition is below: typedef struct { PlannerInfo *root; RelOptInfo *joinrel; RelOptInfo *outerrel; RelOptInfo *innerrel; List *restrictlist; JoinType jointype; SpecialJoinInfo *sjinfo; SemiAntiJoinFactors *semifactors; Relids param_source_rels; Relids extra_lateral_rels; } ExtraJoinPathArgs; then, hook invocation gets much simplified, like: /* * 6. Finally, give extensions a chance to manipulate the path list. */ if (set_join_pathlist_hook) set_join_pathlist_hook(&jargs); Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
2015-05-09 8:18 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2015-05-09 2:46 GMT+09:00 Tom Lane <tgl@sss.pgh.pa.us>: >> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: >>>> I've been trying to code-review this patch, because the documentation >>>> seemed several bricks shy of a load, and I find myself entirely confused >>>> by the fdw_ps_tlist and custom_ps_tlist fields. >> >>> Main-point of your concern is lack of documentation/comments to introduce >>> how does the pseudo-scan targetlist works here, isn't it?? >> >> Well, there's a bunch of omissions and outright errors in the docs and >> comments, but this is the main issue that I was uncertain how to fix >> from looking at the patch. >> >>>> Also, >>>> if that is what they're for (ie, to allow the FDW to redefine the scan >>>> tuple contents) it would likely be better to decouple that feature from >>>> whether the plan node is for a simple scan or a join. >> >>> In this version, we don't intend FDW/CSP to redefine the contents of >>> scan tuples, even though I want off-loads heavy targetlist calculation >>> workloads to external computing resources in *the future version*. >> >> I do not think it's a good idea to introduce such a field now and then >> redefine how it works and what it's for in a future version. We should >> not be moving the FDW APIs around more than we absolutely have to, >> especially not in ways that wouldn't throw an obvious compile error >> for un-updated code. Also, the longer we wait to make a change that >> we know we want, the more pain we inflict on FDW authors (simply because >> there will be more of them a year from now than there are today). >> > Ah, above my sentence don't intend to reuse the existing field for > different works in the future version. It's just what I want to support > in the future version. > Yep, I see. It is not a good idea to redefine the existing field for > different purpose silently. It's not my plan. > >>>> The business about >>>> resjunk columns in that list also seems a bit half baked, or at least >>>> underdocumented. >> >>> I'll add source code comments to introduce how does it works any when >>> does it have resjunk=true. It will be a bit too deep to be introduced >>> in the SGML file. >> >> I don't actually see a reason for resjunk marking in that list at all, >> if what it's for is to define the contents of the scan tuple. I think we >> should just s/ExecCleanTypeFromTL/ExecTypeFromTL/ in nodeForeignscan and >> nodeCustom, and get rid of the "sanity check" in create_foreignscan_plan >> (which is pretty pointless anyway, considering the number of other ways >> you could screw up that tlist without it being detected). >> > http://www.postgresql.org/message-id/9A28C8860F777E439AA12E8AEA7694F8010D7E24@BPXM15GP.gisp.nec.co.jp > > Does the introduction in above post make sense? > The *_ps_tlist is not only used for a basic of scan-tuple descriptor, but > also used to solve var-node if varno==INDEX_VAR in EXPLAIN command. > On the other hands, existence of the junk entries (which are referenced in > external computing resources only) may cause unnecessary projection. > So, I want to discriminate target-entries for basis of scan-tuple descriptor > from other ones just for EXPLAIN command. > >> I'm also inclined to rename the fields to >> fdw_scan_tlist/custom_scan_tlist, which would better reflect what they do, >> and to change the API of make_foreignscan() to add a parameter >> corresponding to the scan tlist. It's utterly bizarre and error-prone >> that this patch has added a field that the FDW is supposed to set and >> not changed make_foreignscan to match. >> > OK, I'll do the both of changes. The name of ps_tlist is a shorten of > "pseudo-scan target-list". So, fdw_scan_tlist/custom_scan_tlist are > almost intentional. > The attached patch renamed *_ps_tlist by *_scan_tlist according to the suggestion. Also, put a few detailed source code comments around this alternative scan_tlist. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, May 8, 2015 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think that we'd really be better off insisting on same server (as in >> same pg_foreign_server OID), hence automatically same FDW, and what's >> even more important, same user mapping for any possible query execution >> context. The possibility that there are some corner cases where some FDWs >> could optimize other scenarios seems to me to be poor return for the bugs >> and security holes that will arise any time typical FDWs forget to check >> this. > I originally wanted to go quite the other way with this and check for > join pushdown via handler X any time at least one of the two relations > involved used handler X, even if the other one used some other handler > or was a plain table. In particular, it seems to me quite plausible > to want to teach an FDW that a certain local table is replicated on a > remote node, allowing a join between a foreign table and a plain table > to be pushed down. If we did do something like that, I think a saner implementation would involve substituting a foreign table for the local one earlier, via view expansion. So by the time we are doing join planning, there would be no need to consider cross-server joins anyway. > This infrastructure can't be used that way anyhow, > so maybe there's no harm in tightening it up, but I'm wary of > circumscribing what FDW authors can do. Somebody who's really intent on shooting themselves in the foot can always use the set_join_pathlist_hook to inject paths for arbitrary joins. The FDW mechanism should support reasonable use cases without undue complication, and I doubt that what we've got now is adding anything except complication and risk of bugs. For the archives' sake, let me lay out a couple of reasons why an FDW that tried to allow cross-server joins would almost certainly be broken, and broken in security-relevant ways. Suppose for instance that postgres_fdw tried to be smart and drill down into foreign tables' server IDs to allow joining of any two tables that have the same effective host name, port, database name, user name, and anything else you think would be relevant to its choice of connections. The trouble with that is that the user mapping is context dependent, in particular one local userID might map to the same remote user name for two different server OIDs, while another might map to different user names. So if we plan a query under the first userID we might decide it's okay to do the join remotely. Then, if we re-use that plan while executing as another userID (which is entirely possible) what will probably happen is that the remote join query will get sent off under one or the other of the remote usernames associated with the second local userID. This could lead to either a permission failure, or a remote table access that should not be allowed to the current local userID. Admittedly, such cases might be rare in practice, but it's still a security hole. Also, even if the FDW is defensive enough to recheck full matching of the tables' connection properties at execution time, there's not much it could do about the situation except fail; it couldn't cause a re-plan to occur. For another case, we do not have any mechanism whereby operations like ALTER SERVER OPTIONS could invalidate existing plans. Thus, even if the two tables' connection properties matched at plan time, there's no guarantee that they still match at execution. This is probably not a security hole (at least not if you assume foreign-server owners are trusted users), but it's still a bug that exists only if someone tries to allow cross-server joins. For these reasons, I think that if an FDW tried to be laxer than "tables must be on the same pg_foreign_server entry to be joined", the odds approach unity that it would be broken, and probably dangerously broken. So we should just make that check for the FDWs. Anybody who thinks they're smarter than the average bear can use set_join_pathlist_hook, but they are probably wrong. regards, tom lane
On Sat, May 9, 2015 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I originally wanted to go quite the other way with this and check for >> join pushdown via handler X any time at least one of the two relations >> involved used handler X, even if the other one used some other handler >> or was a plain table. In particular, it seems to me quite plausible >> to want to teach an FDW that a certain local table is replicated on a >> remote node, allowing a join between a foreign table and a plain table >> to be pushed down. > > If we did do something like that, I think a saner implementation would > involve substituting a foreign table for the local one earlier, via view > expansion. So by the time we are doing join planning, there would be no > need to consider cross-server joins anyway. Huh? You can't do this at rewrite time; it's very fundamentally a planning problem. Suppose the user wants to join A, B, and C, where A is a plain table, B is a plain table that is replicated on a foreign server, and C is a foreign table on that same foreign server. If we decide to join B to C first, we probably want to push down the join, although maybe not, if we estimate that B JOIN C will have more rows than C. If we decide to join A to B first, we want to use the local copy of B. >> This infrastructure can't be used that way anyhow, >> so maybe there's no harm in tightening it up, but I'm wary of >> circumscribing what FDW authors can do. > > Somebody who's really intent on shooting themselves in the foot can always > use the set_join_pathlist_hook to inject paths for arbitrary joins. > The FDW mechanism should support reasonable use cases without undue > complication, and I doubt that what we've got now is adding anything > except complication and risk of bugs. > > For the archives' sake, let me lay out a couple of reasons why an FDW > that tried to allow cross-server joins would almost certainly be broken, > and broken in security-relevant ways. Suppose for instance that > postgres_fdw tried to be smart and drill down into foreign tables' server > IDs to allow joining of any two tables that have the same effective host > name, port, database name, user name, and anything else you think would be > relevant to its choice of connections. The trouble with that is that the > user mapping is context dependent, in particular one local userID might > map to the same remote user name for two different server OIDs, while > another might map to different user names. So if we plan a query under > the first userID we might decide it's okay to do the join remotely. > Then, if we re-use that plan while executing as another userID (which is > entirely possible) what will probably happen is that the remote join > query will get sent off under one or the other of the remote usernames > associated with the second local userID. This could lead to either a > permission failure, or a remote table access that should not be allowed > to the current local userID. Admittedly, such cases might be rare in > practice, but it's still a security hole. Also, even if the FDW is > defensive enough to recheck full matching of the tables' connection > properties at execution time, there's not much it could do about the > situation except fail; it couldn't cause a re-plan to occur. > > For another case, we do not have any mechanism whereby operations like > ALTER SERVER OPTIONS could invalidate existing plans. Thus, even if > the two tables' connection properties matched at plan time, there's no > guarantee that they still match at execution. This is probably not a > security hole (at least not if you assume foreign-server owners are > trusted users), but it's still a bug that exists only if someone tries > to allow cross-server joins. > > For these reasons, I think that if an FDW tried to be laxer than "tables > must be on the same pg_foreign_server entry to be joined", the odds > approach unity that it would be broken, and probably dangerously broken. > So we should just make that check for the FDWs. Anybody who thinks > they're smarter than the average bear can use set_join_pathlist_hook, > but they are probably wrong. Drilling down into postgres_fdw's connection properties seems pretty silly; the user isn't likely to create two SERVER objects that are identical and then choose which one to use at random, and if they do, they deserve what they get. The universe of FDWs, however, is potentially bigger than that. What does a server even mean for file_fdw, for example? I can't think of any reason why somebody would want to implement joins inside file_fdw, but if they did, all the things being joined would be local files, so the server ID doesn't really matter. Now you may say that's a silly use case, but it's less obviously silly if the files contain structured data, as with cstore_fdw, yet the server ID could still be not especially relevant. Maybe you've got servers representing filesystem directories; that shouldn't preclude cross "server" joins. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, May 9, 2015 at 1:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> For these reasons, I think that if an FDW tried to be laxer than "tables >> must be on the same pg_foreign_server entry to be joined", the odds >> approach unity that it would be broken, and probably dangerously broken. >> So we should just make that check for the FDWs. Anybody who thinks >> they're smarter than the average bear can use set_join_pathlist_hook, >> but they are probably wrong. > Drilling down into postgres_fdw's connection properties seems pretty > silly; the user isn't likely to create two SERVER objects that are > identical and then choose which one to use at random, and if they do, > they deserve what they get. The universe of FDWs, however, is > potentially bigger than that. What does a server even mean for > file_fdw, for example? Nothing, which is why you'd only ever create one per database, and so the issue doesn't arise anyway. It would only be sane to create multiple servers per FDW if there were a meaningful difference between them. In any case, since the existing code doesn't support "remote" joins involving a local table unless you use the join-path hook, this argument seems pretty academic. If we tighten the test to be same-server, we will benefit all but very weirdly designed FDWs. Anybody who's not happy with that can still use the hook (and I continue to maintain that they will probably have subtle bugs, but whatever). Another point worth making is that the coding I have in mind doesn't really do anything with RelOptInfo.serverid except compare it for equality. So an FDW that wants to consider some servers interchangeable for joining purposes could override the value at GetForeignPaths time (ie replace "serverid" with the OID of a preferred server), and then it would get GetForeignJoinPaths calls as desired. regards, tom lane
I've committed a cleanup patch along the lines discussed. One thought is that we could test the nondefault-scan-tuple logic without a lot of work by modifying the way postgres_fdw deals with columns it decides don't need to be fetched. Right now it injects NULL columns so that the remote query results still match the foreign table's rowtype, but that's not especially desirable or efficient. What we could do instead is generate an fdw_scan_tlist that just lists the columns we intend to fetch. I don't have time to pursue this idea right now, but I think it would be a good change to squeeze into 9.5, just so that we could have some test coverage on those parts of this patch. regards, tom lane
Tom, I briefly checked your updates. Even though it is not described in the commit-log, I noticed a problematic change. This commit reverts create_plan_recurse() as static function. It means extension cannot have child node, even if it wants to add a custom-join logic. Please assume a simple case below: SELECT * FROM t0, t1 WHERE t0.a = t1.x; An extension adds a custom join path, then its PlanCustomPath method will be called back to create a plan node once it gets chosen by planner. The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes of the source relations also. If create_plan_recurse() is static, we have no way to initialize plan-node for t0 and t1 scan even if join-logic itself is powerful than built-in ones. It was already discussed in the upthread, and people's consensus. Please revert create_plan_recurse() as like initial commit. Also, regarding of the *_scan_tlist, > I don't have time to pursue this idea right now, but I think it would be > a good change to squeeze into 9.5, just so that we could have some test > coverage on those parts of this patch. > Do you want just a test purpose module and regression test cases? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Kohei KaiGai <kaigai@kaigai.gr.jp> writes: > I briefly checked your updates. > Even though it is not described in the commit-log, I noticed a > problematic change. > This commit reverts create_plan_recurse() as static function. Yes. I am not convinced that external callers should be calling that, and would prefer not to enlarge createplan.c's API footprint without a demonstration that this is right and useful. (This is one of many ways in which this patch is suffering from having gotten committed without submitted use-cases.) > It means extension > cannot have child node, even if it wants to add a custom-join logic. > Please assume a simple case below: > SELECT * FROM t0, t1 WHERE t0.a = t1.x; > An extension adds a custom join path, then its PlanCustomPath method will be > called back to create a plan node once it gets chosen by planner. > The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes > of the source relations also. > If create_plan_recurse() is static, we have no way to initialize > plan-node for t0 > and t1 scan even if join-logic itself is powerful than built-in ones. I find this argument quite unconvincing, because even granting that this is an appropriate way to create child nodes of a CustomScan, there is a lot of core code besides createplan.c that would not know about those child nodes either. As a counterexample, suppose that your cool-new-join-method is capable of joining three inputs at once. You could stick two of the children into lefttree and righttree perhaps, but where are you gonna put the other? This comes back to the fact that trying to wedge join behavior into scan node types was a pretty bad idea (as evidenced by the entirely bogus decision that now scanrelid can be zero, which I rather doubt you've found all the places that that broke). Really there should have been a new CustomJoin node or something like that. If we'd done that, it would be possible to design that node type more like Append, with any number of child nodes. And we could have set things up so that createplan.c knows about those child nodes and takes care of the recursion for you; it would still not be a good idea to expose create_plan_recurse and hope that callers of that would know how to use it correctly. I am still more than half tempted to say we should revert this entire patch series and hope for a better design to be submitted for 9.6. In the meantime, though, poking random holes in the modularity of core code is a poor substitute for having designed a well-thought-out API. A possible compromise that we could perhaps still wedge into 9.5 is to extend CustomPath with a List of child Paths, and CustomScan with a List of child Plans, which createplan.c would know to build from the Paths, and other modules would then also be aware of these children. I find that uglier than a separate join node type, but it would be tolerable I guess. regards, tom lane
> Kohei KaiGai <kaigai@kaigai.gr.jp> writes: > > I briefly checked your updates. > > Even though it is not described in the commit-log, I noticed a > > problematic change. > > > This commit reverts create_plan_recurse() as static function. > > Yes. I am not convinced that external callers should be calling that, > and would prefer not to enlarge createplan.c's API footprint without a > demonstration that this is right and useful. (This is one of many > ways in which this patch is suffering from having gotten committed > without submitted use-cases.) > Hmm. I got it is intentional change. > > It means extension > > cannot have child node, even if it wants to add a custom-join logic. > > Please assume a simple case below: > > SELECT * FROM t0, t1 WHERE t0.a = t1.x; > > > An extension adds a custom join path, then its PlanCustomPath method will be > > called back to create a plan node once it gets chosen by planner. > > The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes > > of the source relations also. > > If create_plan_recurse() is static, we have no way to initialize > > plan-node for t0 > > and t1 scan even if join-logic itself is powerful than built-in ones. > > I find this argument quite unconvincing, because even granting that this > is an appropriate way to create child nodes of a CustomScan, there is a > lot of core code besides createplan.c that would not know about those > child nodes either. > > As a counterexample, suppose that your cool-new-join-method is capable of > joining three inputs at once. You could stick two of the children into > lefttree and righttree perhaps, but where are you gonna put the other? > > This comes back to the fact that trying to wedge join behavior into scan > node types was a pretty bad idea (as evidenced by the entirely bogus > decision that now scanrelid can be zero, which I rather doubt you've found > all the places that that broke). Really there should have been a new > CustomJoin node or something like that. If we'd done that, it would be > possible to design that node type more like Append, with any number of > child nodes. And we could have set things up so that createplan.c knows > about those child nodes and takes care of the recursion for you; it would > still not be a good idea to expose create_plan_recurse and hope that > callers of that would know how to use it correctly. > > I am still more than half tempted to say we should revert this entire > patch series and hope for a better design to be submitted for 9.6. > In the meantime, though, poking random holes in the modularity of core > code is a poor substitute for having designed a well-thought-out API. > > A possible compromise that we could perhaps still wedge into 9.5 is to > extend CustomPath with a List of child Paths, and CustomScan with a List > of child Plans, which createplan.c would know to build from the Paths, > and other modules would then also be aware of these children. I find that > uglier than a separate join node type, but it would be tolerable I guess. > At this moment, my custom-join logic add a dummy node to have two child nodes when it tries to join more than 3 relations. Yep, if CustomPath node (ForeignPath also?) can have a list of child-path nodes then core backend handles its initialization job, it will be more comfortable for extensions. I prefer this idea, rather than agree. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kohei KaiGai <kaigai@kaigai.gr.jp> writes: >> I briefly checked your updates. >> Even though it is not described in the commit-log, I noticed a >> problematic change. > >> This commit reverts create_plan_recurse() as static function. > > Yes. I am not convinced that external callers should be calling that, > and would prefer not to enlarge createplan.c's API footprint without a > demonstration that this is right and useful. (This is one of many > ways in which this patch is suffering from having gotten committed > without submitted use-cases.) I really think that reverting somebody else's committed change without discussion is inappropriate. If I don't like the fact that you reverted this change, can I go revert it back? Your unwillingness to make functions global or to stick PGDLLIMPORT markings on variables that people want access to is hugely handicapping extension authors. Many people have complained about that on multiple occasions. Frankly, I find it obstructionist and petty. If you want to improve the design of this so that it does the same things more elegantly, fine: I'll get out of the way. If you just want to make things impossible that the patch previously made possible, I strongly object to that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Your unwillingness to make functions global or to stick PGDLLIMPORT > markings on variables that people want access to is hugely > handicapping extension authors. Many people have complained about > that on multiple occasions. Frankly, I find it obstructionist and > petty. Sure, we could export every last static function in the core code, and extension authors would rejoice ... while development on the core code basically stops for fear of breaking extensions. It's important not to export things that we don't have to, especially when doing so is really just a quick-n-dirty substitute for doing things properly. regards, tom lane
On Sun, May 10, 2015 at 9:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Your unwillingness to make functions global or to stick PGDLLIMPORT >> markings on variables that people want access to is hugely >> handicapping extension authors. Many people have complained about >> that on multiple occasions. Frankly, I find it obstructionist and >> petty. > > Sure, we could export every last static function in the core code, > and extension authors would rejoice ... while development on the core > code basically stops for fear of breaking extensions. It's important > not to export things that we don't have to, especially when doing so > is really just a quick-n-dirty substitute for doing things properly. Please name EVEN ONE instance in which core development has been prevented for fear of changing a function API. Sure, we take those things into consideration, like trying to ensure that there will be type conflicts until people update their code, but I cannot recall a single instance in six and a half years of working on this project where that's been a real problem. I think this concern is entirely hypothetical. Besides, no one has ever proposed making every static function public. It's been proposed a handful of times for limited classes of functions - in this case ONE - and you've fought it every time despite clear consensus on the other side. I find that highly regrettable and I'm very sure I'm not the only one. I notice that you carefully didn't answer the other part of my question: what gives you the right to revert my commits without discussion or consensus, and do I have an equal right to change it back? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-05-10 21:26:26 -0400, Robert Haas wrote: > On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > This commit reverts create_plan_recurse() as static function. > > Yes. I am not convinced that external callers should be calling that, > > and would prefer not to enlarge createplan.c's API footprint without a > > demonstration that this is right and useful. (This is one of many > > ways in which this patch is suffering from having gotten committed > > without submitted use-cases.) Wasn't there a submitted use case? IIRC Kaigai had referenced some pg-strom (?) code using it? I'm failing to see how create_plan_recurse() being exposed externally is related to "having gotten committed without submitted use-cases". Even if submitted, presumably as simple as possible code, doesn't use it, that's not a proof that less simple code does not need it. > Your unwillingness to make functions global or to stick PGDLLIMPORT > markings on variables that people want access to is hugely > handicapping extension authors. Many people have complained about > that on multiple occasions. Frankly, I find it obstructionist and > petty. While I don't find the tone of the characterization super helpful, I do tend to agree that we're *far* too conservative on that end. I've now seen a significant number of extension that copied large swathes of code just to cope with individual functions not being available. And even cases where that lead to minor forks with such details changed. I know that I'm "fearful" of asking for functions being made public. Because it'll invariably get into a discussion of merits that's completely out of proportion with the size of the change. And if I, who has been on the list for a while now, am "afraid" in that way, you can be sure that others won't even dare to ask, lest argue their way through. I think the problem is that during development the default often is to create function as static if they're used only in one file. Which is fine. But it really doesn't work if it's a larger battle to change single incidences. Besides the pain of having to wait for the next major release... Greetings, Andres Freund
On 2015-05-10 21:53:45 -0400, Robert Haas wrote: > Please name EVEN ONE instance in which core development has been > prevented for fear of changing a function API. Even *moving* function declarations to a different file has been laudly and repeatedly complained about... And there's definitely some things around that pretty much only still exist because changing them would break too much stuff. But. I don't think that's a reason to not expose more functions externally. Because the usual consequence of not exposing them is that either ugly workarounds will be found, or code will just copy pasted around. That's not in any way better, and much likely to be worse. I'm not saying that we shouldn't use judgement, but I do think that the current approach ridicules our vaunted extensibility in many cases. Greetings, Andres Freund
On Sun, May 10, 2015 at 10:37 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-05-10 21:53:45 -0400, Robert Haas wrote: >> Please name EVEN ONE instance in which core development has been >> prevented for fear of changing a function API. > > Even *moving* function declarations to a different file has been laudly > and repeatedly complained about... Moving declarations is a lot more likely to break compiles than adding declarations. But even the 9.3 header file reorganizations, which broke enough compiles to be annoying, were only annoying, not a serious problem for anyone. I doubted whether that stuff was worth changing, but that's just because I don't really get excited about partial recompiles. > And there's definitely some things > around that pretty much only still exist because changing them would > break too much stuff. Such as what? > But. > > I don't think that's a reason to not expose more functions > externally. Because the usual consequence of not exposing them is that > either ugly workarounds will be found, or code will just copy pasted > around. That's not in any way better, and much likely to be worse. Yes. > I'm not saying that we shouldn't use judgement, but I do think that the > current approach ridicules our vaunted extensibility in many cases. Double yes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-05-10 22:51:33 -0400, Robert Haas wrote: > > And there's definitely some things > > around that pretty much only still exist because changing them would > > break too much stuff. > > Such as what? Without even thinking about it: * linitial vs lfirst vs lnext. That thing still induces an impedance mismatch when reading code for me, and I believe a goodnumber of other people. * Two 'string buffer' APIs with essentially only minor differences. * A whole bunch of libpq APIs. Admittedly that's a bit more exposed than lots of backend only things. * The whole V0 calling convention that makes it so much easier to get odd crashes. Admittedly that's all I could come up without having to think. But I do vaguely remember a lot of things we did not do because of bwcompat concerns.
> On 2015-05-10 21:26:26 -0400, Robert Haas wrote: > > On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > This commit reverts create_plan_recurse() as static function. > > > Yes. I am not convinced that external callers should be calling that, > > > and would prefer not to enlarge createplan.c's API footprint without a > > > demonstration that this is right and useful. (This is one of many > > > ways in which this patch is suffering from having gotten committed > > > without submitted use-cases.) > > Wasn't there a submitted use case? IIRC Kaigai had referenced some > pg-strom (?) code using it? > > I'm failing to see how create_plan_recurse() being exposed externally is > related to "having gotten committed without submitted use-cases". Even > if submitted, presumably as simple as possible code, doesn't use it, > that's not a proof that less simple code does not need it. > Yes, PG-Strom code uses create_plan_recurse() to construct child plan node of the GPU accelerated custom-join logic, once it got chosen. Here is nothing special. It calls create_plan_recurse() as built-in join path doing on the underlying inner/outer paths. It is not difficult to submit as a working example, however, its total code size (excludes GPU code) is 25KL at this moment. I'm not certain whether it is a simple example. > > Your unwillingness to make functions global or to stick PGDLLIMPORT > > markings on variables that people want access to is hugely > > handicapping extension authors. Many people have complained about > > that on multiple occasions. Frankly, I find it obstructionist and > > petty. > > While I don't find the tone of the characterization super helpful, I do > tend to agree that we're *far* too conservative on that end. I've now > seen a significant number of extension that copied large swathes of code > just to cope with individual functions not being available. And even > cases where that lead to minor forks with such details changed. > I may have to join the members? > I know that I'm "fearful" of asking for functions being made > public. Because it'll invariably get into a discussion of merits that's > completely out of proportion with the size of the change. And if I, who > has been on the list for a while now, am "afraid" in that way, you can > be sure that others won't even dare to ask, lest argue their way > through. > > I think the problem is that during development the default often is to > create function as static if they're used only in one file. Which is > fine. But it really doesn't work if it's a larger battle to change > single incidences. Besides the pain of having to wait for the next > major release... > Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Sun, May 10, 2015 at 11:07 PM, Andres Freund <andres@anarazel.de> wrote: > On 2015-05-10 22:51:33 -0400, Robert Haas wrote: >> > And there's definitely some things >> > around that pretty much only still exist because changing them would >> > break too much stuff. >> >> Such as what? > > Without even thinking about it: > * linitial vs lfirst vs lnext. That thing still induces an impedance > mismatch when reading code for me, and I believe a good number of > other people. > * Two 'string buffer' APIs with essentially only minor differences. > * A whole bunch of libpq APIs. Admittedly that's a bit more exposed than > lots of backend only things. > * The whole V0 calling convention that makes it so much easier to get > odd crashes. > > Admittedly that's all I could come up without having to think. But I do > vaguely remember a lot of things we did not do because of bwcompat > concerns. I see your point, but I don't think it really detracts from mine. The fact that we have a few inconsistently-named list functions is not preventing any core development project that would otherwise get completed to instead not get completed. Nor is any of that other stuff, except maybe the libpq API, but that's a lot (not just a bit) more exposed. Also, I'd actually be in favor of looking for a way to identify the StringInfo and PQexpBuffer stuff - and of partially deprecating the V0 calling convention. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Let me back to the original concern and show possible options we can take here. At least, the latest master is not usable to implement custom-join logic without either of these options. option-1) Revert create_plan_recurse() to non-static function for extensions. It is the simplest solution, however, it is still gray zone which functions shall be public and whether we deal with the non-static functions as a stable API or not. IMO, we shouldn't treat non-static functions as stable APIs, even if it can be called by extensions not only codes in another source file. In fact, we usually changes definition of non-static functions even though we know extensions uses. It is role of extension to follow up the feature across major version. option-2) Tom's suggestion. Add a new list field of Path nodes on CustomPath, then create_customscan_plan() will call static create_plan_recurse() function to construct child plan nodes. Probably, the attached patch will be an image of this enhancement, but not tested yet, of course. Once we adopt this approach, I'll adjust my PG-Strom code towards the new interface within 2 weeks and report problems if any. option-3) Enforce authors of custom-scan provider to copy and paste createplan.c. I really don't want this option and nobody will be happy. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Monday, May 11, 2015 12:48 PM > To: 'Andres Freund'; Robert Haas > Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada; > pgsql-hackers@postgreSQL.org > Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > On 2015-05-10 21:26:26 -0400, Robert Haas wrote: > > > On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > This commit reverts create_plan_recurse() as static function. > > > > Yes. I am not convinced that external callers should be calling that, > > > > and would prefer not to enlarge createplan.c's API footprint without a > > > > demonstration that this is right and useful. (This is one of many > > > > ways in which this patch is suffering from having gotten committed > > > > without submitted use-cases.) > > > > Wasn't there a submitted use case? IIRC Kaigai had referenced some > > pg-strom (?) code using it? > > > > I'm failing to see how create_plan_recurse() being exposed externally is > > related to "having gotten committed without submitted use-cases". Even > > if submitted, presumably as simple as possible code, doesn't use it, > > that's not a proof that less simple code does not need it. > > > Yes, PG-Strom code uses create_plan_recurse() to construct child plan > node of the GPU accelerated custom-join logic, once it got chosen. > Here is nothing special. It calls create_plan_recurse() as built-in > join path doing on the underlying inner/outer paths. > It is not difficult to submit as a working example, however, its total > code size (excludes GPU code) is 25KL at this moment. > > I'm not certain whether it is a simple example. > > > > Your unwillingness to make functions global or to stick PGDLLIMPORT > > > markings on variables that people want access to is hugely > > > handicapping extension authors. Many people have complained about > > > that on multiple occasions. Frankly, I find it obstructionist and > > > petty. > > > > While I don't find the tone of the characterization super helpful, I do > > tend to agree that we're *far* too conservative on that end. I've now > > seen a significant number of extension that copied large swathes of code > > just to cope with individual functions not being available. And even > > cases where that lead to minor forks with such details changed. > > > I may have to join the members? > > > I know that I'm "fearful" of asking for functions being made > > public. Because it'll invariably get into a discussion of merits that's > > completely out of proportion with the size of the change. And if I, who > > has been on the list for a while now, am "afraid" in that way, you can > > be sure that others won't even dare to ask, lest argue their way > > through. > > > > I think the problem is that during development the default often is to > > create function as static if they're used only in one file. Which is > > fine. But it really doesn't work if it's a larger battle to change > > single incidences. Besides the pain of having to wait for the next > > major release... > > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
Hello, I tried to make patches for the three approaches. Please don't think the option-3 serious proposition, however, it is the only solution at this moment unfortunately. In my understanding, we don't guarantee interface compatibility across major version up, including the definitions of non-static functions. It is role of extension's author to follow up the new major version (and raise a problem report during development cycle if feature update makes problems without alternatives). In fact, people usually submit patches and a part of them changes definition of non-static functions, however, nobody can guarantee no extension uses this function thus don't break compatibility. It is a collateral evidence we don't think non-static functions are not stable interface for extensions, and it shall not be a reason why to prohibit functions in spite of its necessity. On the other hands, I understand it is not only issues around createplan.c, but also a (philosophical) issue around criteria and human's decision which functions should be static or non-static. So, it usually takes time to get overall consensus. If we keep the create_plan_recurse() static, the option-2 is a solution to balance both of opinions. Anyway, I really dislike the option-3, want to have a solution. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Tuesday, May 12, 2015 10:24 AM > To: 'Andres Freund'; 'Robert Haas' > Cc: 'Tom Lane'; 'Kohei KaiGai'; 'Thom Brown'; 'Shigeru Hanada'; > 'pgsql-hackers@postgreSQL.org' > Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > Hi, > > Let me back to the original concern and show possible options > we can take here. At least, the latest master is not usable to > implement custom-join logic without either of these options. > > option-1) > Revert create_plan_recurse() to non-static function for extensions. > It is the simplest solution, however, it is still gray zone which > functions shall be public and whether we deal with the non-static > functions as a stable API or not. > IMO, we shouldn't treat non-static functions as stable APIs, even > if it can be called by extensions not only codes in another source > file. In fact, we usually changes definition of non-static functions > even though we know extensions uses. It is role of extension to > follow up the feature across major version. > > > option-2) > Tom's suggestion. Add a new list field of Path nodes on CustomPath, > then create_customscan_plan() will call static create_plan_recurse() > function to construct child plan nodes. > Probably, the attached patch will be an image of this enhancement, > but not tested yet, of course. Once we adopt this approach, I'll > adjust my PG-Strom code towards the new interface within 2 weeks > and report problems if any. > > > option-3) > Enforce authors of custom-scan provider to copy and paste createplan.c. > I really don't want this option and nobody will be happy. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: Kaigai Kouhei(海外 浩平) > > Sent: Monday, May 11, 2015 12:48 PM > > To: 'Andres Freund'; Robert Haas > > Cc: Tom Lane; Kohei KaiGai; Thom Brown; Shigeru Hanada; > > pgsql-hackers@postgreSQL.org > > Subject: RE: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > > > On 2015-05-10 21:26:26 -0400, Robert Haas wrote: > > > > On Sun, May 10, 2015 at 8:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > This commit reverts create_plan_recurse() as static function. > > > > > Yes. I am not convinced that external callers should be calling that, > > > > > and would prefer not to enlarge createplan.c's API footprint without a > > > > > demonstration that this is right and useful. (This is one of many > > > > > ways in which this patch is suffering from having gotten committed > > > > > without submitted use-cases.) > > > > > > Wasn't there a submitted use case? IIRC Kaigai had referenced some > > > pg-strom (?) code using it? > > > > > > I'm failing to see how create_plan_recurse() being exposed externally is > > > related to "having gotten committed without submitted use-cases". Even > > > if submitted, presumably as simple as possible code, doesn't use it, > > > that's not a proof that less simple code does not need it. > > > > > Yes, PG-Strom code uses create_plan_recurse() to construct child plan > > node of the GPU accelerated custom-join logic, once it got chosen. > > Here is nothing special. It calls create_plan_recurse() as built-in > > join path doing on the underlying inner/outer paths. > > It is not difficult to submit as a working example, however, its total > > code size (excludes GPU code) is 25KL at this moment. > > > > I'm not certain whether it is a simple example. > > > > > > Your unwillingness to make functions global or to stick PGDLLIMPORT > > > > markings on variables that people want access to is hugely > > > > handicapping extension authors. Many people have complained about > > > > that on multiple occasions. Frankly, I find it obstructionist and > > > > petty. > > > > > > While I don't find the tone of the characterization super helpful, I do > > > tend to agree that we're *far* too conservative on that end. I've now > > > seen a significant number of extension that copied large swathes of code > > > just to cope with individual functions not being available. And even > > > cases where that lead to minor forks with such details changed. > > > > > I may have to join the members? > > > > > I know that I'm "fearful" of asking for functions being made > > > public. Because it'll invariably get into a discussion of merits that's > > > completely out of proportion with the size of the change. And if I, who > > > has been on the list for a while now, am "afraid" in that way, you can > > > be sure that others won't even dare to ask, lest argue their way > > > through. > > > > > > I think the problem is that during development the default often is to > > > create function as static if they're used only in one file. Which is > > > fine. But it really doesn't work if it's a larger battle to change > > > single incidences. Besides the pain of having to wait for the next > > > major release... > > > > > Thanks, > > -- > > NEC Business Creation Division / PG-Strom Project > > KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2015-05-12 10:24 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > option-2) > Tom's suggestion. Add a new list field of Path nodes on CustomPath, > then create_customscan_plan() will call static create_plan_recurse() > function to construct child plan nodes. > Probably, the attached patch will be an image of this enhancement, > but not tested yet, of course. Once we adopt this approach, I'll > adjust my PG-Strom code towards the new interface within 2 weeks > and report problems if any. +1. This design achieves the functionality required for custom join by Kaigai-san's use case, instantiating sub-plans of CustomScan plan recursively, without exposing create_plan_recurse. CSP can use the index number to distinguish information, like FDWs do with fdw_private. IMO it isn't necessary to apply the change onto ForeignPath/ForeignScan. FDW would have a way to keep-and-use sub paths as private information. In fact, I wrote postgres_fdw to use create_plan_recurse to generate SQL statements of inner/outer relations to be joined, but as of now SQL construction for join push-down is accomplished by calling private function deparseSelectSql recursively at the top of a join tree. Some FDW might hope to use sub-plan generation, but we don't have any concrete use case as of now. -- Shigeru HANADA
> A possible compromise that we could perhaps still wedge into 9.5 is to > extend CustomPath with a List of child Paths, and CustomScan with a List > of child Plans, which createplan.c would know to build from the Paths, > and other modules would then also be aware of these children. I find that > uglier than a separate join node type, but it would be tolerable I guess. > The attached patch implements what you suggested as is. It allows custom-scan providers to have child Paths without exporting create_plan_recurse(), and enables to represent N-way join naturally. Please add any solution, even if we don't reach the consensus of how create_plan_recurse (and other useful static functions) are visible to extensions. Patch detail: It adds a List field (List *custom_children) to CustomPath structure to inform planner its child Path nodes, to be transformed to Plan node through the planner's job. CustomScan also have a new List field to have its child Plan nodes which shall be processed by setrefs.c and subselect.c. PlanCustomPath callback was extended to have a list of Plan nodes that were constructed on create_customscan_plan in core, it is a job of custom-scan provider to attach these Plan nodes onto lefttree, righttree or the custom_children list. CustomScanState also have an array to have PlanState nodes of the children. It is used for EXPLAIN command know the child nodes. Regarding of FDW, as Hanada-san mentioned, I'm uncertain whether similar feature is also needed because its join-pushdown feature scan on the result-set of remotely joined relations, thus no need to have local child Path nodes. So, I put this custom_children list on CustomXXXX structure only. It may need additional section in the documentation. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
2015-05-15 8:43 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Regarding of FDW, as Hanada-san mentioned, I'm uncertain whether > similar feature is also needed because its join-pushdown feature > scan on the result-set of remotely joined relations, thus no need > to have local child Path nodes. > So, I put this custom_children list on CustomXXXX structure only. AFAIS most of FDWs won't need child paths to process their external data. The most possible idea is that a FDW uses output of ForeignScan plan node which is handled by the FDW, but such work should be done by another CSP (or at least via CSP I/F). -- Shigeru HANADA
Let me remind the problem not to be forgotten towards v9.5. The commit 1a8a4e5cde2b7755e11bde2ea7897bd650622d3e disallowed extensions to call create_plan_recurse(), however, it is required for custom-scan node that implements own join logic and takes child nodes to construct Plan node from Path node. So, at this moment, we cannot utilize the new feature well unless extension copies & pastes createplan.c to its own source tree (ugly!). Tom suggested an alternative infrastructure that tells planner Path nodes to be passed to create_plan_recurse() in createplan.c. > > A possible compromise that we could perhaps still wedge into 9.5 is to > > extend CustomPath with a List of child Paths, and CustomScan with a List > > of child Plans, which createplan.c would know to build from the Paths, > > and other modules would then also be aware of these children. I find that > > uglier than a separate join node type, but it would be tolerable I guess. > > > The attached patch implements what you suggested as is. > It allows custom-scan providers to have child Paths without exporting > create_plan_recurse(), and enables to represent N-way join naturally. > Please add any solution, even if we don't reach the consensus of how > create_plan_recurse (and other useful static functions) are visible to > extensions. > Then, I made a patch (which was attached on the last message) according to the suggestion. I think it is one possible option. Or, one other option is to revert create_plan_recurse() non-static function as the infrastructure originally designed. I expect people think it is not a graceful design to force extensions to copy and paste core file with small adjustment. So, either of options, or others if any, needs to be merged to solve the problem. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Friday, May 15, 2015 8:44 AM > To: Tom Lane; Kohei KaiGai > Cc: Robert Haas; Thom Brown; Shigeru Hanada; pgsql-hackers@postgreSQL.org > Subject: Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API) > > > A possible compromise that we could perhaps still wedge into 9.5 is to > > extend CustomPath with a List of child Paths, and CustomScan with a List > > of child Plans, which createplan.c would know to build from the Paths, > > and other modules would then also be aware of these children. I find that > > uglier than a separate join node type, but it would be tolerable I guess. > > > The attached patch implements what you suggested as is. > It allows custom-scan providers to have child Paths without exporting > create_plan_recurse(), and enables to represent N-way join naturally. > Please add any solution, even if we don't reach the consensus of how > create_plan_recurse (and other useful static functions) are visible to > extensions. > > > Patch detail: > It adds a List field (List *custom_children) to CustomPath structure > to inform planner its child Path nodes, to be transformed to Plan node > through the planner's job. > CustomScan also have a new List field to have its child Plan nodes > which shall be processed by setrefs.c and subselect.c. > PlanCustomPath callback was extended to have a list of Plan nodes > that were constructed on create_customscan_plan in core, it is > a job of custom-scan provider to attach these Plan nodes onto > lefttree, righttree or the custom_children list. > CustomScanState also have an array to have PlanState nodes of the > children. It is used for EXPLAIN command know the child nodes. > > Regarding of FDW, as Hanada-san mentioned, I'm uncertain whether > similar feature is also needed because its join-pushdown feature > scan on the result-set of remotely joined relations, thus no need > to have local child Path nodes. > So, I put this custom_children list on CustomXXXX structure only. > > It may need additional section in the documentation. > > Thanks, > -- > NEC Business Creation Division / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com>