Thread: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Kouhei Kaigai
Date:
It is a relevant topic of readfuncs support for custom-scan.

Unlike CustomPath and CustomScanState, we don't allow custom-scan
provider to define own and larger structure that embeds CustomScan
at head of the structure, to support copyObject().
Thus, custom-scan provider needs to have own logic to pack and
unpack private fields, like:
  https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L112

At present, only create_customscan_plan() and _copyCustomScan()
can produce CustomScan node.
The create_customscan_plan() invokes PlanCustomPath callback,
thus, CSP can return a pointer of CustomScan field within
a larger structure. So, we don't adjust this interface.
On the other hands, _copyCustomScan() allocates a new node using
makeNode(CustomScan), thus, here is no space for the larger
structure if CSP wants to use to keep private values without
packing and unpacking.
Only CSP knows exact size of the larger structure, so all we can
do is ask CSP to allocate a new node and copy its private fields.
This patch newly adds NodeCopyCustomScan callback to support
copyObject.

Also, v9.6 will support nodeToString/stringToNode on plan nodes.
So, we need to re-define the role of TextOutCustomScan callback
that is also defined at v9.5.
If CSP extends the CustomScan to save private values on the extra
area, only CSP knows what values and which data types are stored,
thus, all we can do is ask CSP to serialize and deserialize its
private fields without inconsistency.
Even though TextOutCustomScan callback shall be once ripped out
to support readfuncs.c, a pair of TextOut and TextRead callback
will re-define its responsibility again; when CSP defines
a larger structure that embeds CustomScan, a pair of these
callbacks are used to serialize/deserialize the entire custom
defined objects.

The attached patches are for v9.5 and v9.6. The v9.6 patch
assumes the patch for readfuncs support is already applied.
The v9.5 patch assumes the latest master.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Tuesday, November 10, 2015 11:47 AM
> To: Robert Haas
> Cc: Amit Kapila; Andres Freund; pgsql-hackers
> Subject: Re: [HACKERS] CustomScan support on readfuncs.c
> 
> > On Sat, Nov 7, 2015 at 6:38 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > > Are you suggesting to pass the object name on software build time?
> >
> > Yes.  That's what test_shm_mq and worker_spi already do:
> >
> >         sprintf(worker.bgw_library_name, "test_shm_mq");
> >
> OK, I ripped out the portion around dfmgr.c.
> 
> > > If so, it may load incorrect libraries when DBA renamed its filename.
> > > At least, PostgreSQL cannot prevent DBA to rename library filenames
> > > even if they try to deploy "pg_strom.so", "pg_strom.20151106.so" and
> > > "pg_strom.20151030_bug.so" on same directory.
> >
> > I agree.  But that's not a new problem that needs to be solved by this
> > patch.  It already exists - see above.
> >
> It still may be a potential problem, even though a corner case.
> I'll try to implement an internal API to know "what is my name".
> 
> The attached main patch (custom-scan-on-readfuncs.v3.patch) once
> removes TextOutCustomScan, thus all the serialized tokens become
> known to the core backend, and add _readCustomScan() at readfuncs.c.
> It deserializes CustomScan nodes from cstring tokens, especially,
> reloads the pointer of CustomScanMethods tables using a pair of
> library name and symbol name.
> Thus, it also requires custom scan providers to keep the methods
> table visible from external objects.
> 
> Thanks,
> --
> NEC Business Creation Division / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>


Attachment

Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Robert Haas
Date:
On Mon, Nov 9, 2015 at 10:26 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> It is a relevant topic of readfuncs support for custom-scan.
>
> Unlike CustomPath and CustomScanState, we don't allow custom-scan
> provider to define own and larger structure that embeds CustomScan
> at head of the structure, to support copyObject().
> Thus, custom-scan provider needs to have own logic to pack and
> unpack private fields, like:
>   https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L112
>
> At present, only create_customscan_plan() and _copyCustomScan()
> can produce CustomScan node.
> The create_customscan_plan() invokes PlanCustomPath callback,
> thus, CSP can return a pointer of CustomScan field within
> a larger structure. So, we don't adjust this interface.
> On the other hands, _copyCustomScan() allocates a new node using
> makeNode(CustomScan), thus, here is no space for the larger
> structure if CSP wants to use to keep private values without
> packing and unpacking.
> Only CSP knows exact size of the larger structure, so all we can
> do is ask CSP to allocate a new node and copy its private fields.
> This patch newly adds NodeCopyCustomScan callback to support
> copyObject.
>
> Also, v9.6 will support nodeToString/stringToNode on plan nodes.
> So, we need to re-define the role of TextOutCustomScan callback
> that is also defined at v9.5.
> If CSP extends the CustomScan to save private values on the extra
> area, only CSP knows what values and which data types are stored,
> thus, all we can do is ask CSP to serialize and deserialize its
> private fields without inconsistency.
> Even though TextOutCustomScan callback shall be once ripped out
> to support readfuncs.c, a pair of TextOut and TextRead callback
> will re-define its responsibility again; when CSP defines
> a larger structure that embeds CustomScan, a pair of these
> callbacks are used to serialize/deserialize the entire custom
> defined objects.

I don't see this as being a particularly good idea.  The same issue
exists for FDWs, and we're just living with it in that case.  There
was talk previously of improving it, and maybe some day somebody will,
but I can't get excited about it.  Writing serialization and
deserialization code is annoying but completely insufficient, in my
mind, to justify the hassle of creating a system for this that will be
used by very, very little code.

If we do want to improve it, I'm not sure this is the way to go,
either.  I think there could be other designs where we focus on making
the serialization and deserialization options better, rather than
letting people tack stuff onto the struct.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Andres Freund
Date:
On 2015-11-11 14:59:33 -0500, Robert Haas wrote:
> I don't see this as being a particularly good idea.  The same issue
> exists for FDWs, and we're just living with it in that case.

It's absolutely horrible there. I don't see why that's a justification
for much.  To deal with the lack of extensible copy/out/readfuncs I've
just had to copy the entirety of readfuncs.c into an extension. Or you
build replacements for those (as e.g. postgres_fdw essentially has
done).

> If we do want to improve it, I'm not sure this is the way to go,
> either.  I think there could be other designs where we focus on making
> the serialization and deserialization options better, rather than
> letting people tack stuff onto the struct.

Just better serialization doesn't actually help all that much. Being
able to conveniently access data directly, i.e. as fields in a struct,
makes code rather more readable. And in many cases more
efficient. Having to serialize internal datastructures unconditionally,
just so copyfuncs.c works if actually used, makes for a fair amount of
inefficiency (forced deserialization, even when not copying) and uglier
code.

Andres



Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Robert Haas
Date:
On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-11-11 14:59:33 -0500, Robert Haas wrote:
>> I don't see this as being a particularly good idea.  The same issue
>> exists for FDWs, and we're just living with it in that case.
>
> It's absolutely horrible there. I don't see why that's a justification
> for much.  To deal with the lack of extensible copy/out/readfuncs I've
> just had to copy the entirety of readfuncs.c into an extension. Or you
> build replacements for those (as e.g. postgres_fdw essentially has
> done).
>
>> If we do want to improve it, I'm not sure this is the way to go,
>> either.  I think there could be other designs where we focus on making
>> the serialization and deserialization options better, rather than
>> letting people tack stuff onto the struct.
>
> Just better serialization doesn't actually help all that much. Being
> able to conveniently access data directly, i.e. as fields in a struct,
> makes code rather more readable. And in many cases more
> efficient. Having to serialize internal datastructures unconditionally,
> just so copyfuncs.c works if actually used, makes for a fair amount of
> inefficiency (forced deserialization, even when not copying) and uglier
> code.

Fair enough, but I'm still not very interested in having something for
CustomScan only.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
>> Just better serialization doesn't actually help all that much. Being
>> able to conveniently access data directly, i.e. as fields in a struct,
>> makes code rather more readable. And in many cases more
>> efficient. Having to serialize internal datastructures unconditionally,
>> just so copyfuncs.c works if actually used, makes for a fair amount of
>> inefficiency (forced deserialization, even when not copying) and uglier
>> code.

> Fair enough, but I'm still not very interested in having something for
> CustomScan only.

I'm with Robert here.  The fact is that postgres_fdw and other FDWs are
living just fine with the restrictions we put in to allow copyfuncs.c to
copy ForeignScan, and there's been no sufficient justification offered
as to why CustomScan can't play by those rules.

Yes, it would be nice to be able to use a more-tailored struct definition,
but it's not *necessary*.  We should not contort the core system
enormously in order to provide a bit more notational cleanliness to
extensions.

I'd be fine with fixing this if a nice solution were to present itself,
but so far nothing's been offered that wasn't horrid.

BTW, the "inefficiency" argument is junk, unless you provide some hard
evidence of a case where it hurts a lot.  If we were forcing people to
use serialized representations at execution time, it would be meaningful,
but we're not; you can convert as needed at executor setup time.
        regards, tom lane



Re: CustomScan in a larger structure (RE: CustomScan support on readfuncs.c)

From
Kouhei Kaigai
Date:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Nov 11, 2015 at 3:29 PM, Andres Freund <andres@anarazel.de> wrote:
> >> Just better serialization doesn't actually help all that much. Being
> >> able to conveniently access data directly, i.e. as fields in a struct,
> >> makes code rather more readable. And in many cases more
> >> efficient. Having to serialize internal datastructures unconditionally,
> >> just so copyfuncs.c works if actually used, makes for a fair amount of
> >> inefficiency (forced deserialization, even when not copying) and uglier
> >> code.
>
> > Fair enough, but I'm still not very interested in having something for
> > CustomScan only.
>
> I'm with Robert here.  The fact is that postgres_fdw and other FDWs are
> living just fine with the restrictions we put in to allow copyfuncs.c to
> copy ForeignScan, and there's been no sufficient justification offered
> as to why CustomScan can't play by those rules.
>
So, I'd like to propose copy/out/read callbacks for both of ForeignScan
and CustomScan.

> Yes, it would be nice to be able to use a more-tailored struct definition,
> but it's not *necessary*.  We should not contort the core system
> enormously in order to provide a bit more notational cleanliness to
> extensions.
>
> I'd be fine with fixing this if a nice solution were to present itself,
> but so far nothing's been offered that wasn't horrid.
>
> BTW, the "inefficiency" argument is junk, unless you provide some hard
> evidence of a case where it hurts a lot.  If we were forcing people to
> use serialized representations at execution time, it would be meaningful,
> but we're not; you can convert as needed at executor setup time.
>
Indeed, it is a "nice to have" feature. We can survive without luxury
goods but more expensive tax.

Once an extension tries to implement own join logic, it shall have
much larger number of private fields than usual scan logic.
Andres got a shock when he looked at GpuJoin code of PG-Strom before
because of its pack/unpack routine at: https://github.com/pg-strom/devel/blob/master/src/gpujoin.c#L112
The custom_private/fdw_private have to be safe to copyObject(), thus,
we have to pack/unpack private variables to/from a List structure.
When we design serialization/deserialization of Plan nodes, it means
ForeignScan and CustomScan has to pay double cost for this.

I never say it is a "necessary" feature, but "very nice to have".

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>