Re: Custom Scans and private data - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Custom Scans and private data
Date
Msg-id CA+TgmoYhMb-DFwu-26p81hkRhrg-dsB0kP1z29+-2D4SoC_hFw@mail.gmail.com
Whole thread Raw
In response to Re: Custom Scans and private data  (Andres Freund <andres@anarazel.de>)
Responses Re: Custom Scans and private data  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Custom Scans and private data  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Aug 26, 2015 at 11:23 PM, Andres Freund <andres@anarazel.de> wrote:
>> > Looking at
>> > postgres_fdw and the pg-strom example linked upthread imo pretty clearly
>> > shows how ugly this is.  There's also the rather noticeable difference
>> > that we already have callbacks in the node for custom scans (used by
>> > outfuncs), making this rather trivial to add.
>>
>> I will manfully refrain from taking that bait.
>
> While that comment made me laugh, I'm not really sure why my examples
> are bait. One is the probably most used fdw, the other the only user of
> the custom scan interface I know of.

I suspect what Tom is getting at is that he thinks the custom scan
interface is a giant piece of dung which I ought not to have committed
unless somebody was holding a gun to my head at time time, and that
putting function pointers in the structure at all was a particularly
poor design decision on par with the Roman decision to make their
water pipes out of a metal which is toxic if ingested.  He and I are
going to have to agree to disagree on that.  I think it's great that
you are experimenting with the feature, and I think we ought to try to
make it better, even if that requires incompatible changes to the API.

When we originally discussed this topic at a previous PGCon developer
meeting, it was suggested that we might want to have a facility for
custom nodes, and I at least had the impression that this might be
separate from particular facilities we might have for custom paths or
custom plans or custom plan-states.  For example, suppose you can do
this:

void
InitCustomNodeAndres(CustomNodeTemplate *tmpl)
{   tmpl->outfunc = _outAndres;   tmpl->copyfunc = _copyAndres;   tmpl->equalfunc = _equalAndres;   tmpl->readfunc =
_readAndres;
}

void
_PG_init()
{  RegisterCustomNodeType("Andres", "andres", "InitCustomNodeAndres");
}

...
   Andres *andres = makeCustomNode(Andres);

If we had something like that, then you could address your use case by
making the structures you want to pass around be nodes rather than
having to invent a way to smash whatever you have into a binary blob.
That would be pretty cool.  Maybe, with a bit of work, we could even
find a way to get rid of the idea of custom-path, custom-plan, and
custom scan-state as node types in their own right and instead have
some trick where they can just be general custom nodes that have one
or two extra methods defined for the stuff that is specific to the
planner and executor.  Not sure exactly how to set that up off-hand,
but maybe there's a good way.

In general, I think we've made extensibility very hard in areas like
this.  There are a whole bunch of thorny problems here that are
interconnected.  For example, suppose you want to add a new kind of
SQL-visible object to the system, something EnterpriseDB has needed to
do repeatedly over the years.  You have to modify like 40 core source
files.  You need new parser rules: but the parser is not extensible.
You need new keywords: but the keyword table is not extensible.  To
represent the results of parsing, you need new parse nodes: but the
node system is not extensible.  You need to classify your statement as
DML for logging purposes, you need to hook it into the event trigger
system, you need to hook it into objectaddress.c, you need a new
command tag, you need to add your new command to the appropriate place
in ProcessUtility.  You can sort of get control in ProcessUtility from
a hook, but it's awkward, and the others can't be handled at all.  So
you just have to hack core.

It would be nice to fully solve this problem so that some of the EDB
stuff could exist as an extension rather than a modification of the
core code, but given the magnitude of the problem I don't expect to
get there any time soon.  Still, I think making core facilities more
accessible to extensions has a ton of merit. FDWs, background workers,
dynamic shared memory, logical decoding plugins, DDL deparsing, and,
yeah, even custom scans are all relatively recent examples of new
facilities we've been careful to make usable outside core, and there
is, I think, a good deal of evidence that every one of those
facilities has gotten use by developers other than the ones who
installed the core support, which I personally find really satisfying.
I predict that if we take steps to make the node system more
extensible, we will find that that capability gets plenty of use,
including some uses we can't now predict.

And even more broadly: I think the time has come to get really
skeptical about every place in our code where there's a big giant
switch.  All of those represent places where non-core code need not
apply.  And all of them represent places where, instead of each module
registering itself with all the correct subsystems, each subsystem
knows about all of the things that plug into it.  Whoever designed the
shared-memory initialization mechanism was smart: it knows very little
itself, and relies on its clients to tell it what they need.  There
are some things I don't like about that mechanism, but at least we got
that part right, and so extensions can use it.  A lot of other places,
unfortunately, took the simpler approach of just hard-coding
everything.  There are some places where performance or other concerns
justify that approach, but there are a lot of places where I think
another approach would be better.  EDB is far from the only company
that would benefit from making more of these subsystems extensible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: 9.5 feature count
Next
From: Tom Lane
Date:
Subject: Re: Custom Scans and private data