Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API) - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8010D40FD@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)  (Shigeru HANADA <shigeru.hanada@gmail.com>)
Responses Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
List pgsql-hackers
Hanada-san,

> I reviewed the Custom/Foreign join API patch again after writing a patch of join
> push-down support for postgres_fdw.
>
Thanks for your dedicated jobs, my comments are inline below.

> Here, please let me summarize the changes in the patch as the result of my review.
> 
> * Add set_join_pathlist_hook_type in add_paths_to_joinrel
> This hook is intended to provide a chance to add one or more CustomPaths for an
> actual join combination.  If the join is reversible, the hook is called for both
> A * B and B * A.  This is different from FDW API but it seems fine because FDWs
> should have chances to process the join in more abstract level than CSPs.
> 
> Parameters are same as hash_inner_and_outer, so they would be enough for hash-like
> or nestloop-like methods.  I’m not sure whether mergeclause_list is necessary
> as a parameter or not.  It’s information for merge join which is generated when
> enable_mergejoin is on and the join is not FULL OUTER.  Does some CSP need it
> for processing a join in its own way?  Then it must be in parameter list because
> select_mergejoin_clauses is static so it’s not accessible from external modules.
>
I think, a preferable way is to reproduce the mergeclause_list by extension itself,
rather than pass it as a hook argument, because it is uncertain whether CSP should
follow "enable_mergejoin" parameter even if it implements a logic like merge-join.
Of course, it needs to expose select_mergejoin_clauses. It seems to me a straight-
forward way.

> The timing of the hooking, after considering all built-in path types, seems fine
> because some of CSPs might want to use built-in paths as a template or a source.
> 
> One concern is in the document of the hook function.  "Implementing Custom Paths”
> says:
> 
> > A custom scan provider will be also able to add paths by setting the following
> hook, to replace built-in join paths by custom-scan that performs as if a scan
> on preliminary joined relations, which us called after the core code has generated
> what it believes to be the complete and correct set of access paths for the join.
> 
> I think “replace” would mis-lead readers that CSP can remove or edit existing
> built-in paths listed in RelOptInfo#pathlist or linked from cheapest_foo.  IIUC
> CSP can just add paths for the join relation, and planner choose it if it’s the
> cheapest.
>
I adjusted the documentation stuff as follows:

   A custom scan provider will be also able to add paths by setting the
   following hook, to add <literal>CustomPath</> nodes that perform as
   if built-in join logic doing. It is typically expected to take two
   input relations then generate a joined output stream, or just scans
   preliminaty joined relations like materialized-view. This hook is
   called next to the consideration of core join logics, then planner
   will choose the best path to run the relations join in the built-in
   and custom ones.

Probably, it can introduce what this hook works correctly.
v12 patch updated only this portion.

> * Add new FDW API GetForeignJoinPaths in make_join_rel
> This FDW API is intended to provide a chance to add ForeignPaths for a join relation.
> This is called only once for a join relation, so FDW should consider reversed
> combination if it’s meaningful in their own mechanisms.
> 
> Note that this is called only when the join relation was *NOT* found in the
> PlannerInfo, to avoid redundant calls.
>
Yep, it is designed according to the discussion upthreads.
It can produce N-way remote join paths even if intermediate join relation is
more expensive than local join + two foreign scan.

> Parameters seems enough for postgres_fdw to process N-way join on remote side
> with pushing down join conditions and remote filters.
>
You ensured it clearly.

> * Treat scanrelid == 0 as pseudo scan
> A foreign/custom join is represented by a scan against a pseudo relation, i.e.
> result of a join.  Usually Scan has valid scanrelid, oid of a relation being
> scanned, and many functions assume that it’s always valid.  The patch adds another
> code paths for scanrelid == 0 as custom/foreign join scans.
>
Right,

> * Pseudo scan target list support
> CustomScan and ForeignScan have csp_ps_tlist and fdw_ps_tlist respectively, for
> column reference tracking.  A scan generated for custom/foreign join would have
> column from multiple relations in its target list, i.e. output columns.  Ordinary
> scans have all valid columns of the relation as output, so references to them
> can be resolved easily, but we need an additional mechanism to determine where
> a reference in a target list of custom/foreign scan come from.  This is very
> similar to what IndexOnlyScan does, so we reuse INDEX_VAR as mark of an indirect
> reference to another relation’s var.
>
Right, FDW/CSP driver is responsible to set *_ps_tlist to inform the core planner
which columns of relations are referenced, and which attribute represents what
columns/relations. It is an interface contract when foreign/custom-scan is chosen
instead of the built-in join logic.

> For this mechanism, set_plan_refs is changed to fix Vars in ps_tlist of CustomScan
> and ForeignScan.  For this change, new BitmapSet function bms_shift_members is
> added.
> 
> set_deparse_planstate is also changed to pass ps_tlist as namespace for
> deparsing.
>
Yep, it is same as IndexOnlyScan.

> These chanes seems reasonable, so I mark this patch as “ready for committers”
> to hear committers' thoughts.
>
Thanks!
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


Attachment

pgsql-hackers by date:

Previous
From: Jan de Visser
Date:
Subject: Re: Idea: closing the loop for "pg_ctl reload"
Next
From: "keenan@thebrocks.net"
Date:
Subject: Authenticating from SSL certificates