Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c) - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F80117205C@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
Responses Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> On Wed, Nov 11, 2015 at 11:13 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > I agree with we have no reason why only custom-scan is allowed to have
> > serialize/deserialize capability. I can implement an equivalent stuff
> > for foreign-scan also, and it is helpful for extension authors, especially,
> > who tries to implement remote join feature because FDW driver has to
> > keep larger number of private fields than scan.
> >
> > Also, what is the reason why we allow extensions to define a larger
> > structure which contains CustomPath or CustomScanState? It seems to
> > me that these types are not (fully) supported by the current copyfuncs,
> > outfuncs and readfuncs, aren't it?
> > (Although outfuncs.c supports path-nodes, thus CustomPathMethods has
> > TextOut callback but no copy/read handler at this moment.)
> 
> I feel like we're sort of plunging blindly down a path of trying to
> make certain parts of the system extensible and pluggable without
> really having a clear vision of where we want to end up.  Somehow I
> feel like we ought to start by thinking about a framework for *node*
> extensibility; then, within that, we can talk about what that means
> for particular types of nodes.
> 
> For example, suppose do something like this:
> 
> typedef struct
> {
>    NodeTag tag;
>    char *extnodename;
> } ExtensibleNode;
> 
> typedef stuct
> {
>     char *extnodename;
>     char *library_name;
>     char *function_name;
>     Size node_size;
>     ExtensibleNode *(*nodeCopy)(ExtensibleNode *);
>     bool (*nodeEqual)(ExtensibleNode *, ExtensibleNode *);
>     void (*nodeOut)(StringInfo str, ExtensibleNode *);
>     ExtensibleNode (*nodeRead)(void);
> } ExtensibleNodeMethods;
> 
> extern void RegisterExtensibleNodeMethods(ExtensibleNodeMethods *);
> extern ExtensibleNodeMethods *GetExtensibleNodeMethods(char *extnodename);
> 
> This provides a generic infrastructure so that we don't have to keep
> inventing the same stuff over and over, rather than coming up with one
> way to do it for ForeignScans and another way for CustomScan and
> another way for CustomScanState.
>
Wow! I love the idea.

I guess we will put a pointer to static ExtensibleNodeMethods structure
on ForeignScan, CustomScan, CustomScanState and etc...

Like: typedef struct ForeignScan {     Scan        scan;     Oid         fs_server;      /* OID of foreign server */
List       *fdw_exprs;      /* expressions that FDW may evaluate */     List       *fdw_private;    /* private data for
FDW*/     List       *fdw_scan_tlist; /* optional tlist describing scan tuple */     List       *fdw_recheck_quals;  /*
originalquals not in scan.plan.qual */     Bitmapset  *fs_relids;      /* RTIs generated by this scan */     bool
fsSystemCol;    /* true if any "system column" is needed */
 
+     const ExtensibleNodeMethods *methods; } ForeignScan;

We may be able to put ExtensibleNodeMethods on the head of CustomScanMethods
structure in case of CustomScan.

Let me write down the steps for deserialization. Please correct me if it is
not what you are thinking.
First, stringToNode picks up a token that shall have node label, like
"SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
tokens belong to CustomScan node, it can make advance the pg_strtok pointer
until "methods" field. The method field is consists of a pair of library_name
and symbol_name, then it tries to load the library and resolve the symbol.
Even if library was not loaded on the worker side, this step allows to ensure
the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.
Next, it looks up the table of extensible nodes by a pair of extnodename,
library_name and symbol_name, to get ExtensibleNodeMethods table in this
process address space.
Once it can get nodeRead() callback, we can continue deserialization of
private fields.

I have two concerns on the above proposition.
1) 'node_size' field implies user defined structure to have fixed length.  Extension may want to use variable length
structure.The example below  shows GpuJoinState has multiple innerState structure according to the  number of inner
relationsjoined at once.  https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240
 

2) It may be valuable if 'nodeRead(void)' can take an argument of the  knows/common portion of ForeignScan and etc...
Itallows extension  to adjust number of private fields according to the known part.  For example, I can expect flexible
lengthprivate fields according  to the length of custom_subplans list.
 

How about your thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Freeze avoidance of very large table.
Next
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan