Thread: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From

Kouhei Kaigai

Date:

17 November 2015, 05:03:42

> On Wed, Nov 11, 2015 at 11:13 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > I agree with we have no reason why only custom-scan is allowed to have
> > serialize/deserialize capability. I can implement an equivalent stuff
> > for foreign-scan also, and it is helpful for extension authors, especially,
> > who tries to implement remote join feature because FDW driver has to
> > keep larger number of private fields than scan.
> >
> > Also, what is the reason why we allow extensions to define a larger
> > structure which contains CustomPath or CustomScanState? It seems to
> > me that these types are not (fully) supported by the current copyfuncs,
> > outfuncs and readfuncs, aren't it?
> > (Although outfuncs.c supports path-nodes, thus CustomPathMethods has
> > TextOut callback but no copy/read handler at this moment.)
> 
> I feel like we're sort of plunging blindly down a path of trying to
> make certain parts of the system extensible and pluggable without
> really having a clear vision of where we want to end up.  Somehow I
> feel like we ought to start by thinking about a framework for *node*
> extensibility; then, within that, we can talk about what that means
> for particular types of nodes.
> 
> For example, suppose do something like this:
> 
> typedef struct
> {
>    NodeTag tag;
>    char *extnodename;
> } ExtensibleNode;
> 
> typedef stuct
> {
>     char *extnodename;
>     char *library_name;
>     char *function_name;
>     Size node_size;
>     ExtensibleNode *(*nodeCopy)(ExtensibleNode *);
>     bool (*nodeEqual)(ExtensibleNode *, ExtensibleNode *);
>     void (*nodeOut)(StringInfo str, ExtensibleNode *);
>     ExtensibleNode (*nodeRead)(void);
> } ExtensibleNodeMethods;
> 
> extern void RegisterExtensibleNodeMethods(ExtensibleNodeMethods *);
> extern ExtensibleNodeMethods *GetExtensibleNodeMethods(char *extnodename);
> 
> This provides a generic infrastructure so that we don't have to keep
> inventing the same stuff over and over, rather than coming up with one
> way to do it for ForeignScans and another way for CustomScan and
> another way for CustomScanState.
>
Wow! I love the idea.

I guess we will put a pointer to static ExtensibleNodeMethods structure
on ForeignScan, CustomScan, CustomScanState and etc...

Like: typedef struct ForeignScan {     Scan        scan;     Oid         fs_server;      /* OID of foreign server */
List       *fdw_exprs;      /* expressions that FDW may evaluate */     List       *fdw_private;    /* private data for
FDW*/     List       *fdw_scan_tlist; /* optional tlist describing scan tuple */     List       *fdw_recheck_quals;  /*
originalquals not in scan.plan.qual */     Bitmapset  *fs_relids;      /* RTIs generated by this scan */     bool
fsSystemCol;    /* true if any "system column" is needed */

+     const ExtensibleNodeMethods *methods; } ForeignScan;

We may be able to put ExtensibleNodeMethods on the head of CustomScanMethods
structure in case of CustomScan.

Let me write down the steps for deserialization. Please correct me if it is
not what you are thinking.
First, stringToNode picks up a token that shall have node label, like
"SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
tokens belong to CustomScan node, it can make advance the pg_strtok pointer
until "methods" field. The method field is consists of a pair of library_name
and symbol_name, then it tries to load the library and resolve the symbol.
Even if library was not loaded on the worker side, this step allows to ensure
the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.
Next, it looks up the table of extensible nodes by a pair of extnodename,
library_name and symbol_name, to get ExtensibleNodeMethods table in this
process address space.
Once it can get nodeRead() callback, we can continue deserialization of
private fields.

I have two concerns on the above proposition.
1) 'node_size' field implies user defined structure to have fixed length.  Extension may want to use variable length
structure.The example below  shows GpuJoinState has multiple innerState structure according to the  number of inner
relationsjoined at once.  https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240

2) It may be valuable if 'nodeRead(void)' can take an argument of the  knows/common portion of ForeignScan and etc...
Itallows extension  to adjust number of private fields according to the known part.  For example, I can expect flexible
lengthprivate fields according  to the length of custom_subplans list.

How about your thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From

Robert Haas

Date:

18 November 2015, 20:07:49

On Mon, Nov 16, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> I guess we will put a pointer to static ExtensibleNodeMethods structure
> on ForeignScan, CustomScan, CustomScanState and etc...

I think that makes it confusing.  What I'd prefer to do is have only
the name stored in the node, and then people who need methods can look
up those methods based on the name.

> Let me write down the steps for deserialization. Please correct me if it is
> not what you are thinking.
> First, stringToNode picks up a token that shall have node label, like
> "SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
> tokens belong to CustomScan node, it can make advance the pg_strtok pointer
> until "methods" field. The method field is consists of a pair of library_name
> and symbol_name, then it tries to load the library and resolve the symbol.

No.  It reads the "extnodename" field (or whatever we decide to call
it) and calls GetExtensibleNodeMethods() on that name.  That name
returns the appropriate structure.  C libraries that get loaded into
the server can register their extensible node types at _PG_init()
time.

> Even if library was not loaded on the worker side, this step allows to ensure
> the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.

A parallel worker will load the same loadable modules which the master
has got loaded.  If you use some other kind of background worker, of
course, you're on your own.

> I have two concerns on the above proposition.
> 1) 'node_size' field implies user defined structure to have fixed length.
>    Extension may want to use variable length structure. The example below
>    shows GpuJoinState has multiple innerState structure according to the
>    number of inner relations joined at once.
>    https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240

OK, we can replace the node_size field with an allocator callback:
Node (*nodeAlloc)(void) or something like that.

> 2) It may be valuable if 'nodeRead(void)' can take an argument of the
>    knows/common portion of ForeignScan and etc... It allows extension
>    to adjust number of private fields according to the known part.
>    For example, I can expect flexible length private fields according
>    to the length of custom_subplans list.

I don't understand this bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company