Re: Custom Scans and private data - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Custom Scans and private data
Date
Msg-id 20150827235351.GA4147@awork2.anarazel.de
Whole thread Raw
In response to Re: Custom Scans and private data  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2015-08-27 18:59:13 -0400, Tom Lane wrote:
> But I do think that letting custom-scan extensions fool about with
> the semantics of plan-copying (and plan-serialization for that matter)
> is a risky choice that is not justified by some marginal gains in code
> readability

To me the likelihood of having bugs in manual (de)serilization via lists
et al. is higher than introducing bugs via copyfuncs. Or even
out/readfuncs, especially as we could easily add a bunch of
verifications to the latter by verifying field names in READ_*. Wether a
prepared statement fails because a copyfuncs hook failed, or whether
it's in the fdw itself doesn't seem to make that much of a difference.

> especially when there are other feasible ways to attack the
> readability problem.

Lets at least add a Value variant that can store strings that aren't
null terminated, that'd already make things a lot better. Right now you
need to reuse Const nodes and stuff a bytea in there or such.

But even then, that only works if you have a single flat datastructure
that doesn't have any pointers in it. copyObject() style copying can
easily cope with that, using a binary blob really can't.

As far as I can see you right now basically need to write code for
custom scans so that between planning and execution you unconditionally
do:
Planning:
1) create your internal data
2) serialize data,
Execution:
3) unserialize data
4) use unserialized data

There's no way to stash data between 2 and 3 anywhere, even if there's
no need for the whole copying rigamarole because it was just a plain
statement.

If you look at the various *Scan, *Join etc. nodes we have in core, most
of them have a bunch of variable length data structures and pointers in
them. All that you can't sanely do for a custom scan node.

> copyObject needs to be a pretty self-contained operation with not a
> lot of external dependencies

Why?

> > [ ruminations about how to improve the system's extensibility ]
> 
> Yeah, I've entertained similar thoughts in the past.  It's not very clear
> how we get there from here, though --- for instance, any fundamental
> change in the Node system would break so much code that it's unlikely
> anyone would thank us for it, even assuming we ever finished the
> conversion project.

How about adding a single 'ExtensibleNode' node type (inheriting from
Node as normal). Many of the switches over node types would gain one
additional entry for that, and do something like   case T_ExtensibleNode:       handler = LookupHandler((ExtensibleNode
*)node)->extended_type, Handler_ExecInitNode);       if (handler)       {            result = ((ExecInitNodeHandler *)
handler)(node,estate, eflags);            break;       }       /* fall through to error */   default:       elog(ERROR,
"unrecognizednode type: %d", (int) nodeTag(node));
 

that should be doable one-by-one in many of these bigger
switches. There's obviously some limits where that's doable, but I don't
think it'd break much code?

How exactly to identify extended_type in a way that makes sense
e.g. between restarting pg, backends, primary/standby isn't trivial, but
I do think it should be doable.

> I'm also less than convinced that "I should be able to install major new
> features without touching the core code" is actually a sane or useful goal
> to have.

Yea, I have some doubts there as well. There's a lot of features though
which we really don't want in core, where some of the limitations of the
node system really do become problematic.

> As a concrete example, you mentioned the lack of extensibility of the
> bison parser, which is undoubtedly a blocking issue if you want a new
> feature that would require new SQL syntax.

While that's a problem, where I really don't have any good answer, I
also think in many cases it's primarily DDL, and primarily extending
existing DDL. By far the most things I've seen want to add informations
to tables and/or columns - and storage options actually kinda give you
that on the grammar level. We just don't allow you too sanely hook into
it, and there's a bunch of other pointless restrictions.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL for VAX on NetBSD/OpenBSD
Next
From: Tom Lane
Date:
Subject: Re: PostgreSQL for VAX on NetBSD/OpenBSD