Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c) - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F80119F58D@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
Responses Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Sorry for my late response. I've been unavailable to have enough
time to touch code for the last 1.5 month.

The attached patch is a revised one to handle private data of
foregn/custom scan node more gracefully.

The overall consensus upthread were:
- A new ExtensibleNodeMethods structure defines a unique name
  and a set of callbacks to handle node copy, serialization,
  deserialization and equality checks.
- (Foreign|Custom)(Path|Scan|ScanState) are first host of the
  ExtensibleNodeMethods, to allow extension to define larger
  structure to store its private fields.
- ExtensibleNodeMethods does not support variable length
  structure (like a structure with an array on the tail, use
  separately allocated array).
- ExtensibleNodeMethods shall be registered on _PG_init() of
  extensions.

The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
this feature. As I pointed out before, it uses dynhash instead
of the self invented hash table.

Interfaces are defined as follows (not changed from v2):

  typedef struct ExtensibleNodeMethods
  {
     const char *extnodename;
     Size        node_size;
     void      (*nodeCopy)(Node *newnode, const Node *oldnode);
     bool      (*nodeEqual)(const Node *a, const Node *b);
     void      (*nodeOut)(struct StringInfoData *str, const Node *node);
     void      (*nodeRead)(Node *node);
  } ExtensibleNodeMethods;
  
  extern void
  RegisterExtensibleNodeMethods(const ExtensibleNodeMethods *methods);
  
  extern const ExtensibleNodeMethods *
  GetExtensibleNodeMethods(const char *extnodename, bool missing_ok);


Also, 'extensible-node-example-on-pgstrom.patch' is a working
example on its "GpuScan" node.
The code below uses all of copy, serialization and deserialization.

    gscan = (GpuScan *)stringToNode(nodeToString(copyObject(cscan)));
    elog(INFO, "GpuScan: %s", nodeToString(gscan));

Then, I could confirm private fields are reproduced correctly.

In addition to this, I'd like to suggest two small improvement.

On nodeOut callback, extension will need _outToken() and _outBitmap(),
however, these two functions are static. Entrypoint for extensions
are needed. (Of course, extension can copy & paste these small functions...)

ExtensibleNodeMethods may be registered with a unique pair of its
name and node-tag which is associated with. The current code requires
the name is unique to others, however, it may make a bit inconvenience.
In case of CustomScan, extension need to define three nodes: CustomPath,
CustomScan and CustomScanState, thus, ExtensibleNodeMethods which is
associated with these node must have individually unique name, like
"GpuScanPath", "GpuScan" and "GpuScanState".
If extnodename would be unique within a particular node type, we can
apply same name for all of the three above.

How about your thought?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Kaigai Kouhei(海外 浩平)
> Sent: Wednesday, December 02, 2015 5:52 PM
> To: 'Robert Haas'
> Cc: Andres Freund; Amit Kapila; pgsql-hackers
> Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support
> on readfuncs.c)
> 
> > On Thu, Nov 26, 2015 at 5:27 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > > I'm now implementing. The above design perfectly works on ForeignScan.
> > > On the other hands, I'd like to have deeper consideration for CustomScan.
> > >
> > > My recent patch adds LibraryName and SymbolName on CustomScanMethods
> > > to lookup the method table even if library is not loaded yet.
> > > However, this ExtensibleNodeMethods relies custom scan provider shall
> > > be loaded, by parallel infrastructure, prior to the deserialization.
> > > It means extension has a chance to register itself as well.
> > >
> > > My idea is, redefine CustomScanMethod as follows:
> > >
> > > typedef struct ExtensibleNodeMethods
> > > {
> > >     const char *extnodename;
> > >     Size        node_size;
> > >     Node     *(*nodeCopy)(const Node *from);
> > >     bool      (*nodeEqual)(const Node *a, const Node *b);
> > >     void      (*nodeOut)(struct StringInfoData *str, const Node *node);
> > >     void      (*nodeRead)(Node *node);
> > > } ExtensibleNodeMethods;
> > >
> > > typedef struct CustomScanMethods
> > > {
> > >     union {
> > >         const char *CustomName;
> > >         ExtensibleNodeMethods  xnode;
> > >     };
> > >     /* Create execution state (CustomScanState) from a CustomScan plan node
> > */
> > >     Node       *(*CreateCustomScanState) (struct CustomScan *cscan);
> > > } CustomScanMethods;
> > >
> > > It allows to pull CustomScanMethods using GetExtensibleNodeMethods
> > > by CustomName; that is alias of extnodename in ExtensibleNodeMethods.
> > > Thus, we don't need to care about LibraryName and SymbolName.
> > >
> > > This kind of trick is not needed for ForeignScan because all the pointer
> > > stuff are reproducible by catalog, however, method table of CustomScan
> > > is not.
> > >
> > > How about your opinion?
> >
> > Anonymous unions are not C89, so this is a no-go.
> >
> > I think we should just drop CustomName and replace it with
> > ExtensibleNodeMethods.  That will break things for third-party code
> > that is using the existing CustomScan stuff, but there's a good chance
> > that nobody other than you has written any such code.  And even if
> > someone has, updating it for this change will not be very difficult.
> >
> Thanks, I have same opinion.
> At this moment, CustomName is not utilized so much except for EXPLAIN
> and debug output, so it is not a hard stuff to replace this field by
> extnodename, even if custom scan provider does not have own structure
> thus no callbacks of node operations are defined.
> 
> The attached patch adds 'extnodename' fields on ForeignPath and
> ForeignScan, also embeds ExtensibleNodeMethods in CustomPathMethods,
> CustomScanMethods and CustomExecMethods.
> 
> Its handlers in copyfuncs.c were enhanced to utilize the callback
> to allocate a larger structure and copy private fields.
> Handlers in outfuncs.c and readfuncs.c have to be symmetric. The
> core backend writes out 'extnodename' prior to any private fields,
> then we can identify the callback to process rest of tokens for
> private fields.
> 
> RegisterExtensibleNodeMethods() tracks a pair of 'extnodename'
> and ExtensibleNodeMethods itself. It saves the pointer of the
> method table, but not duplicate, because _readCustomScan()
> cast the method pulled by 'extnodename' thus registered table
> has to be a static variable on extension; that means extension
> never update ExtensibleNodeMethods once registered.
> 
> The one other patch is my test using PG-Strom, however, I didn't
> have a test on FDW extensions yet.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: why pg_size_pretty is volatile?
Next
From: Peter Eisentraut
Date:
Subject: Re: Releasing in September