Thread: Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From

Kouhei Kaigai

Date:

21 November 2015, 02:25:14

> On Thu, Nov 19, 2015 at 10:11 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > The above two points are for the case if and when extension want to use
> > variable length fields for its private fields.
> > So, nodeAlloc() callback is not a perfect answer for the use case because
> > length of the variable length fields shall be (usually) determined by the
> > value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
> > we cannot determine the correct length before read.
> >
> > Let's assume an custom-scan extension that wants to have:
> >
> >   typedef struct {
> >       CustomScan    cscan;
> >       int           priv_value_1;
> >       long          priv_value_2;
> >       extra_info_t  extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];
> >       /* its length equal to number of sub-plans */
> >   } ExampleScan;
> >
> > The "extnodename" within CustomScan allows to pull callback functions
> > to handle read node from the serialized format.
> > However, nodeAlloc() callback cannot determine the correct length
> > (= number of sub-plans in this example) prior to read 'cscan' part.
> >
> > So, I'd like to suggest _readCustomScan (and other extendable nodes
> > also) read tokens on local CustomScan variable once, then call
> >   Node *(nodeRead)(Node *local_node)
> > to allocate entire ExampleScan node and read other private fields.
> >
> > The local_node is already filled up by the core backend prior to
> > the callback invocation, so extension can know how many sub-plans
> > are expected thus how many private tokens shall appear.
> > It also means extension can know exact length of the ExampleScan
> > node, so it can allocate the node as expected then fill up
> > remaining private tokens.
> 
> On second thought, I think we should insist that nodes have to be
> fixed-size.  This sounds like a mess.  If the node needs a variable
> amount of storage for something, it can store a pointer to a
> separately-allocated array someplace inside of it.  That's what
> existing nodes do, and it works fine.
>
OK, let's have "node_size" instead of nodeAlloc callback and other
stuffs.

Please wait for a patch.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From

Kouhei Kaigai

Date:

26 November 2015, 13:28:27

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Saturday, November 21, 2015 8:05 AM
> To: Robert Haas
> Cc: Andres Freund; Amit Kapila; pgsql-hackers
> Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support
> on readfuncs.c)
> 
> > On Thu, Nov 19, 2015 at 10:11 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > > The above two points are for the case if and when extension want to use
> > > variable length fields for its private fields.
> > > So, nodeAlloc() callback is not a perfect answer for the use case because
> > > length of the variable length fields shall be (usually) determined by the
> > > value of another fields (like num_inner_rels, num_gpu_devices, ...) thus
> > > we cannot determine the correct length before read.
> > >
> > > Let's assume an custom-scan extension that wants to have:
> > >
> > >   typedef struct {
> > >       CustomScan    cscan;
> > >       int           priv_value_1;
> > >       long          priv_value_2;
> > >       extra_info_t  extra_subplan_info[FLEXIBLE_ARRAY_MEMBER];
> > >       /* its length equal to number of sub-plans */
> > >   } ExampleScan;
> > >
> > > The "extnodename" within CustomScan allows to pull callback functions
> > > to handle read node from the serialized format.
> > > However, nodeAlloc() callback cannot determine the correct length
> > > (= number of sub-plans in this example) prior to read 'cscan' part.
> > >
> > > So, I'd like to suggest _readCustomScan (and other extendable nodes
> > > also) read tokens on local CustomScan variable once, then call
> > >   Node *(nodeRead)(Node *local_node)
> > > to allocate entire ExampleScan node and read other private fields.
> > >
> > > The local_node is already filled up by the core backend prior to
> > > the callback invocation, so extension can know how many sub-plans
> > > are expected thus how many private tokens shall appear.
> > > It also means extension can know exact length of the ExampleScan
> > > node, so it can allocate the node as expected then fill up
> > > remaining private tokens.
> >
> > On second thought, I think we should insist that nodes have to be
> > fixed-size.  This sounds like a mess.  If the node needs a variable
> > amount of storage for something, it can store a pointer to a
> > separately-allocated array someplace inside of it.  That's what
> > existing nodes do, and it works fine.
> >
> OK, let's have "node_size" instead of nodeAlloc callback and other
> stuffs.
> 
> Please wait for a patch.
>
I'm now implementing. The above design perfectly works on ForeignScan.
On the other hands, I'd like to have deeper consideration for CustomScan.

My recent patch adds LibraryName and SymbolName on CustomScanMethods
to lookup the method table even if library is not loaded yet.
However, this ExtensibleNodeMethods relies custom scan provider shall
be loaded, by parallel infrastructure, prior to the deserialization.
It means extension has a chance to register itself as well.

My idea is, redefine CustomScanMethod as follows:

typedef struct ExtensibleNodeMethods
{   const char *extnodename;   Size        node_size;   Node     *(*nodeCopy)(const Node *from);   bool
(*nodeEqual)(constNode *a, const Node *b);   void      (*nodeOut)(struct StringInfoData *str, const Node *node);   void
    (*nodeRead)(Node *node);
 
} ExtensibleNodeMethods;

typedef struct CustomScanMethods
{   union {       const char *CustomName;       ExtensibleNodeMethods  xnode;   };   /* Create execution state
(CustomScanState)from a CustomScan plan node */   Node       *(*CreateCustomScanState) (struct CustomScan *cscan);
 
} CustomScanMethods;

It allows to pull CustomScanMethods using GetExtensibleNodeMethods
by CustomName; that is alias of extnodename in ExtensibleNodeMethods.
Thus, we don't need to care about LibraryName and SymbolName.

This kind of trick is not needed for ForeignScan because all the pointer
stuff are reproducible by catalog, however, method table of CustomScan
is not.

How about your opinion?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From

Robert Haas

Date:

01 December 2015, 02:47:16

On Thu, Nov 26, 2015 at 5:27 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> I'm now implementing. The above design perfectly works on ForeignScan.
> On the other hands, I'd like to have deeper consideration for CustomScan.
>
> My recent patch adds LibraryName and SymbolName on CustomScanMethods
> to lookup the method table even if library is not loaded yet.
> However, this ExtensibleNodeMethods relies custom scan provider shall
> be loaded, by parallel infrastructure, prior to the deserialization.
> It means extension has a chance to register itself as well.
>
> My idea is, redefine CustomScanMethod as follows:
>
> typedef struct ExtensibleNodeMethods
> {
>     const char *extnodename;
>     Size        node_size;
>     Node     *(*nodeCopy)(const Node *from);
>     bool      (*nodeEqual)(const Node *a, const Node *b);
>     void      (*nodeOut)(struct StringInfoData *str, const Node *node);
>     void      (*nodeRead)(Node *node);
> } ExtensibleNodeMethods;
>
> typedef struct CustomScanMethods
> {
>     union {
>         const char *CustomName;
>         ExtensibleNodeMethods  xnode;
>     };
>     /* Create execution state (CustomScanState) from a CustomScan plan node */
>     Node       *(*CreateCustomScanState) (struct CustomScan *cscan);
> } CustomScanMethods;
>
> It allows to pull CustomScanMethods using GetExtensibleNodeMethods
> by CustomName; that is alias of extnodename in ExtensibleNodeMethods.
> Thus, we don't need to care about LibraryName and SymbolName.
>
> This kind of trick is not needed for ForeignScan because all the pointer
> stuff are reproducible by catalog, however, method table of CustomScan
> is not.
>
> How about your opinion?

Anonymous unions are not C89, so this is a no-go.

I think we should just drop CustomName and replace it with
ExtensibleNodeMethods.  That will break things for third-party code
that is using the existing CustomScan stuff, but there's a good chance
that nobody other than you has written any such code.  And even if
someone has, updating it for this change will not be very difficult.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company