Thread: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Kouhei Kaigai
Date:
> On Mon, Nov 16, 2015 at 9:03 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > I guess we will put a pointer to static ExtensibleNodeMethods structure
> > on ForeignScan, CustomScan, CustomScanState and etc...
> 
> I think that makes it confusing.  What I'd prefer to do is have only
> the name stored in the node, and then people who need methods can look
> up those methods based on the name.
>
OK. It is equivalent.

> > Let me write down the steps for deserialization. Please correct me if it is
> > not what you are thinking.
> > First, stringToNode picks up a token that shall have node label, like
> > "SEQSCAN", "CUSTOMSCAN" and others. Once we can identify the following
> > tokens belong to CustomScan node, it can make advance the pg_strtok pointer
> > until "methods" field. The method field is consists of a pair of library_name
> > and symbol_name, then it tries to load the library and resolve the symbol.
> 
> No.  It reads the "extnodename" field (or whatever we decide to call
> it) and calls GetExtensibleNodeMethods() on that name.  That name
> returns the appropriate structure.  C libraries that get loaded into
> the server can register their extensible node types at _PG_init()
> time.
>
OK, I got.

> > Even if library was not loaded on the worker side, this step allows to ensure
> > the library gets loaded, thus, PG_init() can RegisterExtensibleNodeMethods.
> 
> A parallel worker will load the same loadable modules which the master
> has got loaded.  If you use some other kind of background worker, of
> course, you're on your own.
>
Sorry, I oversight it. And SerializeLibraryState() and RestoreLibraryState()
indeed allow background workers even if out of parallel context to restore
the library state easily.

> > I have two concerns on the above proposition.
> > 1) 'node_size' field implies user defined structure to have fixed length.
> >    Extension may want to use variable length structure. The example below
> >    shows GpuJoinState has multiple innerState structure according to the
> >    number of inner relations joined at once.
> >    https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L240
> 
> OK, we can replace the node_size field with an allocator callback:
> Node (*nodeAlloc)(void) or something like that.
> 
> > 2) It may be valuable if 'nodeRead(void)' can take an argument of the
> >    knows/common portion of ForeignScan and etc... It allows extension
> >    to adjust number of private fields according to the known part.
> >    For example, I can expect flexible length private fields according
> >    to the length of custom_subplans list.
> 
> I don't understand this bit.
>
The above two points are for the case if and when extension want to use
variable length fields for its private fields.
So, nodeAlloc() callback is not a perfect answer for the use case because
length of the variable length fields shall be (usually) determined by the
value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
we cannot determine the correct length before read.

Let's assume an custom-scan extension that wants to have:
 typedef struct {     CustomScan    cscan;     int           priv_value_1;     long          priv_value_2;
extra_info_t extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];     /* its length equal to number of sub-plans */ }
ExampleScan;

The "extnodename" within CustomScan allows to pull callback functions
to handle read node from the serialized format.
However, nodeAlloc() callback cannot determine the correct length
(= number of sub-plans in this example) prior to read 'cscan' part.

So, I'd like to suggest _readCustomScan (and other extendable nodes
also) read tokens on local CustomScan variable once, then call Node *(nodeRead)(Node *local_node)
to allocate entire ExampleScan node and read other private fields.

The local_node is already filled up by the core backend prior to
the callback invocation, so extension can know how many sub-plans
are expected thus how many private tokens shall appear.
It also means extension can know exact length of the ExampleScan
node, so it can allocate the node as expected then fill up
remaining private tokens.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Robert Haas
Date:
On Thu, Nov 19, 2015 at 10:11 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> The above two points are for the case if and when extension want to use
> variable length fields for its private fields.
> So, nodeAlloc() callback is not a perfect answer for the use case because
> length of the variable length fields shall be (usually) determined by the
> value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
> we cannot determine the correct length before read.
>
> Let's assume an custom-scan extension that wants to have:
>
>   typedef struct {
>       CustomScan    cscan;
>       int           priv_value_1;
>       long          priv_value_2;
>       extra_info_t  extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];
>       /* its length equal to number of sub-plans */
>   } ExampleScan;
>
> The "extnodename" within CustomScan allows to pull callback functions
> to handle read node from the serialized format.
> However, nodeAlloc() callback cannot determine the correct length
> (= number of sub-plans in this example) prior to read 'cscan' part.
>
> So, I'd like to suggest _readCustomScan (and other extendable nodes
> also) read tokens on local CustomScan variable once, then call
>   Node *(nodeRead)(Node *local_node)
> to allocate entire ExampleScan node and read other private fields.
>
> The local_node is already filled up by the core backend prior to
> the callback invocation, so extension can know how many sub-plans
> are expected thus how many private tokens shall appear.
> It also means extension can know exact length of the ExampleScan
> node, so it can allocate the node as expected then fill up
> remaining private tokens.

On second thought, I think we should insist that nodes have to be
fixed-size.  This sounds like a mess.  If the node needs a variable
amount of storage for something, it can store a pointer to a
separately-allocated array someplace inside of it.  That's what
existing nodes do, and it works fine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company