Thread: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
From
Kouhei Kaigai
Date:
> On Wed, Nov 11, 2015 at 11:13 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > I agree with we have no reason why only custom-scan is allowed to have > > serialize/deserialize capability. I can implement an equivalent stuff > > for foreign-scan also, and it is helpful for extension authors, especially, > > who tries to implement remote join feature because FDW driver has to > > keep larger number of private fields than scan. > > > > Also, what is the reason why we allow extensions to define a larger > > structure which contains CustomPath or CustomScanState? It seems to > > me that these types are not (fully) supported by the current copyfuncs, > > outfuncs and readfuncs, aren't it? > > (Although outfuncs.c supports path-nodes, thus CustomPathMethods has > > TextOut callback but no copy/read handler at this moment.) > > I feel like we're sort of plunging blindly down a path of trying to > make certain parts of the system extensible and pluggable without > really having a clear vision of where we want to end up. Somehow I > feel like we ought to start by thinking about a framework for *node* > extensibility; then, within that, we can talk about what that means > for particular types of nodes. > > For example, suppose do something like this: > > typedef struct > { > NodeTag tag; > char *extnodename; > } ExtensibleNode; > > typedef stuct > { > char *extnodename; > char *library_name; > char *function_name; > Size node_size; > ExtensibleNode *(*nodeCopy)(ExtensibleNode *); > bool (*nodeEqual)(ExtensibleNode *, ExtensibleNode *); > void (*nodeOut)(StringInfo str, ExtensibleNode *); > ExtensibleNode (*nodeRead)(void); > } ExtensibleNodeMethods; > > extern void RegisterExtensibleNodeMethods(ExtensibleNodeMethods *); > extern ExtensibleNodeMethods *GetExtensibleNodeMethods(char *extnodename); > > This provides a generic infrastructure so that we don't have to keep > inventing the same stuff over and over, rather than coming up with one > way to do it for ForeignScans and another way for CustomScan and > another way for CustomScanState. > Wow! I love the idea. I guess we will put a pointer to static ExtensibleNodeMethods structure on ForeignScan, CustomScan, CustomScanState and etc... Like: typedef struct ForeignScan { Scan scan; Oid fs_server; /* OID of foreign server */ List *fdw_exprs; /* expressions that FDW may evaluate */ List *fdw_private; /* private data for FDW*/ List *fdw_scan_tlist; /* optional tlist describing scan tuple */ List *fdw_recheck_quals; /* originalquals not in scan.plan.qual */ Bitmapset *fs_relids; /* RTIs generated by this scan */ bool fsSystemCol; /* true if any "system column" is needed */ + const ExtensibleNodeMethods *methods; } ForeignScan; We may be able to put ExtensibleNodeMethods on the head of CustomScanMethods structure in case of CustomScan. Let me write down the steps for deserialization. Please correct me if it is not what you are thinking. First, stringToNode picks up a token that shall have node label, like "SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following tokens belong to CustomScan node, it can make advance the pg_strtok pointer until "methods" field. The method field is consists of a pair of library_name and symbol_name, then it tries to load the library and resolve the symbol. Even if library was not loaded on the worker side, this step allows to ensure the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods. Next, it looks up the table of extensible nodes by a pair of extnodename, library_name and symbol_name, to get ExtensibleNodeMethods table in this process address space. Once it can get nodeRead() callback, we can continue deserialization of private fields. I have two concerns on the above proposition. 1) 'node_size' field implies user defined structure to have fixed length. Extension may want to use variable length structure.The example below shows GpuJoinState has multiple innerState structure according to the number of inner relationsjoined at once. https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240 2) It may be valuable if 'nodeRead(void)' can take an argument of the knows/common portion of ForeignScan and etc... Itallows extension to adjust number of private fields according to the known part. For example, I can expect flexible lengthprivate fields according to the length of custom_subplans list. How about your thought? Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Nov 16, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > I guess we will put a pointer to static ExtensibleNodeMethods structure > on ForeignScan, CustomScan, CustomScanState and etc... I think that makes it confusing. What I'd prefer to do is have only the name stored in the node, and then people who need methods can look up those methods based on the name. > Let me write down the steps for deserialization. Please correct me if it is > not what you are thinking. > First, stringToNode picks up a token that shall have node label, like > "SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following > tokens belong to CustomScan node, it can make advance the pg_strtok pointer > until "methods" field. The method field is consists of a pair of library_name > and symbol_name, then it tries to load the library and resolve the symbol. No. It reads the "extnodename" field (or whatever we decide to call it) and calls GetExtensibleNodeMethods() on that name. That name returns the appropriate structure. C libraries that get loaded into the server can register their extensible node types at _PG_init() time. > Even if library was not loaded on the worker side, this step allows to ensure > the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods. A parallel worker will load the same loadable modules which the master has got loaded. If you use some other kind of background worker, of course, you're on your own. > I have two concerns on the above proposition. > 1) 'node_size' field implies user defined structure to have fixed length. > Extension may want to use variable length structure. The example below > shows GpuJoinState has multiple innerState structure according to the > number of inner relations joined at once. > https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240 OK, we can replace the node_size field with an allocator callback: Node (*nodeAlloc)(void) or something like that. > 2) It may be valuable if 'nodeRead(void)' can take an argument of the > knows/common portion of ForeignScan and etc... It allows extension > to adjust number of private fields according to the known part. > For example, I can expect flexible length private fields according > to the length of custom_subplans list. I don't understand this bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company