Re: Custom Scans and private data - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Custom Scans and private data
Date
Msg-id 20150826222342.GK19326@awork2.anarazel.de
Whole thread Raw
In response to Re: Custom Scans and private data  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Custom Scans and private data  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2015-08-25 19:22:04 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think it was a noticeable mistake in the fdw case, but we already
> > released with that. We shouldn't make the same mistake twice.
> 
> I don't agree that it was a mistake, and I do think there is value in
> consistency.

Having to write ad-hoc serialization code in a different way in each fdw
seems like a bad idea to me. The output of that serialization is rather
hard to read since there's no names and such, and it's quite
inefficient. I have a hard time seing the benefits.

> In the case at hand, it would not be too hard to provide some utility
> functions for some common cases; for instance, if you want to just store
> a struct, we could offer convenience functions to wrap that in a bytea
> constant and unwrap it again.  Those could be useful for both FDWs and
> custom scans.

Yes, that'd already be an improvement. Right now it seems the best way
to achieve that is to either base64 the output into a Value node or use
bytea inside a Const node. Alternatively you can make it a list of ints
but that obviously isn't a grand idea.

But imo that's not really good enough as soon as you have more than a
single struct. For anything but that you'll have to write
serialization/deserialization code again.

Additionally you're in many cases forced to always use the serialized
representation, even if there's no copyObject() inbetween, since you
don't have a proper way to store non-serialized information!

Check out serializePlanData in
https://github.com/laurenz/oracle_fdw/blob/master/oracle_fdw.c - that's
something we shouldn't put people through.

Actually, *especially* if we at some point want to ship plans over the
wire, that kind of serialization is a bad idea - it's very hard to have
any sort of checks. out/readfuncs.c type serialization will have field
names, but you can't do to the equivalent from these interfaces.

> Nor does "shove it all into some callbacks" mean that the callbacks
> will be easy to write.)

Being able to only perform copying/serialization when necessary, storing
the data in a non-serialized manner, seem pretty clear improvements. I
think it's important to be able to easily store core data in fields, but
that's just a copyObject() call away.

> > Looking at
> > postgres_fdw and the pg-strom example linked upthread imo pretty clearly
> > shows how ugly this is.  There's also the rather noticeable difference
> > that we already have callbacks in the node for custom scans (used by
> > outfuncs), making this rather trivial to add.
> 
> I will manfully refrain from taking that bait.

While that comment made me laugh, I'm not really sure why my examples
are bait. One is the probably most used fdw, the other the only user of
the custom scan interface I know of.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Qingqing Zhou
Date:
Subject: Re: Our trial to TPC-DS but optimizer made unreasonable plan
Next
From: Peter Geoghegan
Date:
Subject: Re: 9.5 release notes