Thread: Allowing extensions to find out the OIDs of their member objects

Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

20 January 2019, 23:50:33

In [1] I propose that we should allow extensions to get their hands
on the ability to transform function calls as per "protransform" and
to generate lossy index quals based on items in WHERE clauses.  The
APIs to give an extension control at the right points seem pretty
straightforward, but there's a problem standing in the way of actually
doing anything useful once you've got control.  In order to either
analyze a passed-in clause or generate a new one, the extension will
need to know the OIDs of the functions/operators it's working with.
And extension objects don't have fixed OIDs.

In principle this could be dealt with by looking up said OIDs from
the catalogs, but that's (1) complicated, (2) slow, (3) prone to
possibly-security-sensitive mistakes such as omitting a schema
specification, and (4) at risk of getting broken entirely by
user-accessible changes such as ALTER FUNCTION RENAME.  Point (2) can
be alleviated by caching, but that just makes (1) even worse, plus
there are lots of ways to do caching wrong.

I thought about extending the extension infrastructure to provide
some way of retrieving relevant OIDs.  We could imagine, for instance,
that an extension script has a way to say "this function is object
number three within this extension", and while running the script
we make a catalog entry showing that object number three has OID
thus-and-so, and then that catalog entry can be consulted to get
the right OID (by C code that has hard-wired knowledge that object
number three is the function it cares about).  This is still kind
of messy, because aside from the hand-assigned object numbers
you'd have to use the extension name as part of the lookup key,
making the name into something the C code critically depends on.
We don't have ALTER EXTENSION RENAME, so maybe that's okay, but
it seems painful to say that we can never have it.

In the end it seems like possibly the cleanest answer is to change
things so that extensions *can* have fixed OIDs that their C code can
know, eliminating lookup costs and allowing coding conventions for this
sort of work to be the same as in the core backend.  We could raise
FirstNormalObjectId to create some unused OID space that we could then
assign chunks of to specific extensions on-request.  This is problematic
for relations, types, and roles, because pg_upgrade wants to preserve
OIDs of those objects across upgrades, so we could not ensure that the
"unused" space is free of such objects.  But it would work for all
other object types, and I think that it might well be sufficient if
an extension can have fixed OIDs for its functions, operators, and
opfamilies/opclasses.  (At need, it could find out the OID for one of
its types by looking up the argument or result types for one of its
functions.)

There are various places in pg_upgrade and postgres_fdw that
assume more than they perhaps should about the significance of
FirstNormalObjectId, but I think that that could be dealt with.

A larger issue is whether "hand out some OIDs on-demand" is a
sustainable strategy.  I think that it is, if we encourage extensions
to assign fixed OIDs only to objects they really need to.  In thirty-ish
years of core PG development, we've only used up ~4200 fixed OIDs,
and a lot of those are for functions that probably don't really need
fixed OIDs but got one because we give one to every built-in function.
However, if there's a big land rush to claim large chunks of OIDs,
we might have a problem.

We'd have to invent some SQL syntax whereby extension scripts can
actually apply their assigned OIDs to their objects.  I'm not very
enthused about adding an "OID nnn" option to every type of CREATE
command that might need this.  A quick-and-dirty answer is to create
support functions similar to binary_upgrade_set_next_pg_type_oid()
that set the OID to give to the next-created object of each category
we need to support.

There are various issues and bits of work around this, but the only one
that I've thought of that I haven't got an answer for is "how should
an extension upgrade script assign a fixed OID to an object that already
existed in the previous extension version, but without a fixed OID?".
We can't just change the recorded OID because that'll break
dependencies, view references, etc.  Conceivably we could write code
that runs through the catalogs and changes all references, but man
that'd be a mess.  Anyone have another idea?

Another question is whether you need any special permissions to assign
a fixed OID in this way.  The most conservative answer is to require
superuserness, which might be enough because the plausible use-cases
for fixed OIDs involve C code, which you'd need to be superuser to
install anyhow.  But it seems overkill somehow.  OTOH, it would be
annoying if a random user could eat up a "reserved" OID that later
prevented someone from installing an extension they wanted.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/15193.1548028093@sss.pgh.pa.us

Re: Allowing extensions to find out the OIDs of their member objects

From

Andrew Gierth

Date:

21 January 2019, 00:25:16

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> A larger issue is whether "hand out some OIDs on-demand" is a
 Tom> sustainable strategy.

No.

Not for any concerns over availability of oids, but simply from the fact
that we have no business whatsoever inserting ourselves into the
extension development process in this way.

-- 
Andrew (irc:RhodiumToad)

Re: Allowing extensions to find out the OIDs of their member objects

From

Chapman Flack

Date:

21 January 2019, 00:34:54

On 01/20/19 18:50, Tom Lane wrote:
> we make a catalog entry showing that object number three has OID
> thus-and-so, and then that catalog entry can be consulted to get
> the right OID (by C code that has hard-wired knowledge that object
> number three is the function it cares about).  This is still kind
> of messy, because aside from the hand-assigned object numbers
> you'd have to use the extension name as part of the lookup key,
> making the name into something the C code critically depends on.
> We don't have ALTER EXTENSION RENAME, so maybe that's okay, but
> it seems painful to say that we can never have it.

An extension *has* an OID, doesn't it? pg_extension has 'em.

If the extension script could somehow be informed at CREATE EXTENSION time
of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?

Somehow, I find this first idea more aesthetically appealing than
actually trying to bind things in extensions to fixed OIDs for all time.

-Chap

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

21 January 2019, 00:38:36

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> A larger issue is whether "hand out some OIDs on-demand" is a
>  Tom> sustainable strategy.

> No.

> Not for any concerns over availability of oids, but simply from the fact
> that we have no business whatsoever inserting ourselves into the
> extension development process in this way.

I'm not exactly following this concern.  I wasn't imagining that we'd
assign each individual OID ourselves, but rather give out blocks of OIDs.
Admittedly, the blocks can't be huge, but it doesn't seem to me that
this'd create an impossible burden for either us or extension developers.

We could also reserve some range of OIDs for "local extensions", whereby
people who didn't intend to publish their extensions for widespread use
could just use some of those OIDs rather than having to ask for a
public assignment.

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

21 January 2019, 00:43:04

Chapman Flack <chap@anastigmatix.net> writes:
> On 01/20/19 18:50, Tom Lane wrote:
>> we make a catalog entry showing that object number three has OID
>> thus-and-so, and then that catalog entry can be consulted to get
>> the right OID (by C code that has hard-wired knowledge that object
>> number three is the function it cares about).  This is still kind
>> of messy, because aside from the hand-assigned object numbers
>> you'd have to use the extension name as part of the lookup key,
>> making the name into something the C code critically depends on.
>> We don't have ALTER EXTENSION RENAME, so maybe that's okay, but
>> it seems painful to say that we can never have it.

> An extension *has* an OID, doesn't it? pg_extension has 'em.

Sure.

> If the extension script could somehow be informed at CREATE EXTENSION time
> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?

And it remembers that where?

> Somehow, I find this first idea more aesthetically appealing than
> actually trying to bind things in extensions to fixed OIDs for all time.

I don't find it appealing particularly, but at least it hasn't got
any insurmountable-looking problems --- other than the "you can't
rename your extension" one.  If we can't make the fixed-OIDs approach
work, this might be a workable second choice.

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Andrew Gierth

Date:

21 January 2019, 00:49:13

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> I'm not exactly following this concern. I wasn't imagining that
 Tom> we'd assign each individual OID ourselves, but rather give out
 Tom> blocks of OIDs. Admittedly, the blocks can't be huge, but it
 Tom> doesn't seem to me that this'd create an impossible burden for
 Tom> either us or extension developers.

Even that's not acceptable. There is no reason why someone should not be
able to create extensions freely without us ever knowing about them or
needing to.

In fact I suggest that "there shall be no registries of third parties"
be made a formal project policy.

 Tom> We could also reserve some range of OIDs for "local extensions",
 Tom> whereby people who didn't intend to publish their extensions for
 Tom> widespread use could just use some of those OIDs rather than
 Tom> having to ask for a public assignment.

That's not acceptable either; local extensions have a way of becoming
global.

Seriously, this whole idea is a lazy hack. Fixed assignments? really?

-- 
Andrew (irc:RhodiumToad)

Re: Allowing extensions to find out the OIDs of their member objects

From

Chapman Flack

Date:

21 January 2019, 01:04:56

On 01/20/19 19:43, Tom Lane wrote:
>> If the extension script could somehow be informed at CREATE EXTENSION time
>> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?
> 
> And it remembers that where?

Top level answer: up to the extension author.

Next level answer: maybe not all extensions have libraries to load,
but for those that do (a good portion), wouldn't it be convenient
for _PG_init() to get the value, either passed as an argument, or
through some API?

After the extension is created, loading of the library is going to be
occasioned through the call of some function, right? That function
just got looked up, and it has an 'e' dependency recorded on the
extension, giving the extension OID.

I can't judge whether that's too much lookup action for the library
load machinery to be doing; how frequent are library loads, and how
much would that add to the cycles they already require?

Regards,
-Chap

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

21 January 2019, 01:05:40

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> In fact I suggest that "there shall be no registries of third parties"
> be made a formal project policy.

You're a decade or two too late for that; see pg_statistic.h.

In any case, it's not like this issue applies to every extension anybody
might want to make.  Only quite advanced extensions would have any need
for the features that known-at-compile-time OIDs would help with, as shown
by the fact that everyone's gotten by without them so far.  And people who
have a reason to fly under the radar could always stick with the method of
doing object-name-based runtime lookups.

I also note that multiple people have asked for extensions to have stable
OIDs for other reasons.  Unfortunately, the most common reason is so that
client apps could hard-wire knowledge about type OIDs they see in query
results, and my proposal excludes being able to do that :-(.  But it's
not like nobody has wanted publicly-assigned OIDs before.

There may well be good technical reasons why we shouldn't go this route
(the extension upgrade problem in particular).  But your objection seems
basically political and I reject it as a valid argument.

> Seriously, this whole idea is a lazy hack. Fixed assignments? really?

Hardly lazy.  It's the most difficult approach (from our standpoint)
of the three I mentioned; but the flip side of that is it takes the
least work, and produces the most efficient code, for extension
developers.

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

21 January 2019, 01:29:09

Chapman Flack <chap@anastigmatix.net> writes:
> On 01/20/19 19:43, Tom Lane wrote:
>>> If the extension script could somehow be informed at CREATE EXTENSION time
>>> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no?

>> And it remembers that where?

> Top level answer: up to the extension author.

That isn't an answer, it's an admission of failure.

> Next level answer: maybe not all extensions have libraries to load,
> but for those that do (a good portion), wouldn't it be convenient
> for _PG_init() to get the value, either passed as an argument, or
> through some API?

There's no hard connection between libraries and extensions though.
In fact, right now there's no connection at all.

> After the extension is created, loading of the library is going to be
> occasioned through the call of some function, right?

There's LOAD, and there's also the possibility that the library supports
multiple extensions, or that some of its functions don't belong to an
extension.

A notable problem here is that at the point where the library (probably)
first gets loaded during CREATE EXTENSION, the extension doesn't exist
yet; or even if it does, the pg_depend entry linking the to-be-created
function certainly doesn't.

It is possible that an extension function could chase its 'e' dependency
when called (*not* when being loaded) to find out the OID of its
extension, but frankly I don't see typical extension authors doing that,
even if it didn't have failure cases.

Actually, behavior during CREATE EXTENSION seems like a bit of a problem
for the whole mapping idea --- partway through the script, you'd certainly
not know all the object IDs, so there would not be a complete map
available if one of the extension's functions gets called during that
script.  This could be worked around, but it makes things more complicated
for extension authors than I'd envisioned at first.  They can't just
assume that they have all the object OIDs available.  For the use-cases
described so far, it seems like it'd be OK to just fall back to doing
nothing if the map isn't ready yet, but we'd have to keep that restriction
in mind while defining future call scenarios.

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

David Fetter

Date:

21 January 2019, 02:09:09

On Mon, Jan 21, 2019 at 12:25:16AM +0000, Andrew Gierth wrote:
> >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
>  Tom> A larger issue is whether "hand out some OIDs on-demand" is a
>  Tom> sustainable strategy.
> 
> No.
> 
> Not for any concerns over availability of oids, but simply from the fact
> that we have no business whatsoever inserting ourselves into the
> extension development process in this way.

+1 for keeping our nose out of this business.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Allowing extensions to find out the OIDs of their member objects

From

Michael Paquier

Date:

21 January 2019, 09:14:53

On Sun, Jan 20, 2019 at 06:50:33PM -0500, Tom Lane wrote:
> A larger issue is whether "hand out some OIDs on-demand" is a
> sustainable strategy.  I think that it is, if we encourage extensions
> to assign fixed OIDs only to objects they really need to.  In thirty-ish
> years of core PG development, we've only used up ~4200 fixed OIDs,
> and a lot of those are for functions that probably don't really need
> fixed OIDs but got one because we give one to every built-in function.
> However, if there's a big land rush to claim large chunks of OIDs,
> we might have a problem.

Hm.  Such things are a bit concerning.  There are many closed and open
extensions, so it looks hard to not create conflicts between multiple
extensions trying to get the same range of OIDs or even the same OIDs
and users willing to combine some of them.  This could mess up the
user experience.
--
Michael

Attachment

signature.asc

Re: Allowing extensions to find out the OIDs of their member objects

From

Andres Freund

Date:

21 January 2019, 23:10:53

Hi,

On 2019-01-20 18:50:33 -0500, Tom Lane wrote:
> In [1] I propose that we should allow extensions to get their hands
> on the ability to transform function calls as per "protransform" and
> to generate lossy index quals based on items in WHERE clauses.  The
> APIs to give an extension control at the right points seem pretty
> straightforward, but there's a problem standing in the way of actually
> doing anything useful once you've got control.  In order to either
> analyze a passed-in clause or generate a new one, the extension will
> need to know the OIDs of the functions/operators it's working with.
> And extension objects don't have fixed OIDs.

Why does it need to know all its oids, rather than just the one of the
operation protransform is called for? Am I missing something?  And if
so, why isn't it sufficient to just pass in that oid along with the
node?

> A larger issue is whether "hand out some OIDs on-demand" is a
> sustainable strategy.  I think that it is, if we encourage extensions
> to assign fixed OIDs only to objects they really need to.  In thirty-ish
> years of core PG development, we've only used up ~4200 fixed OIDs,
> and a lot of those are for functions that probably don't really need
> fixed OIDs but got one because we give one to every built-in function.
> However, if there's a big land rush to claim large chunks of OIDs,
> we might have a problem.

I'm not sure that "30 years" argument holds that much power - we've
surely reused oids. And core PG is going to be much more conservative
than any sort of external user.

Which range are you thinking of handing out here? We use oid ranges as
proxy for system-object-ness in a number of places, so we can't really
just hand out ones below FirstNormalObjectId. And we can't really hand
out any above that, because there'd be conflicts - even if we increased
FirstNormalObjectId today, there'd be issues with pg_upgrade.

Greetings,

Andres Freund

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

21 January 2019, 23:52:05

Andres Freund <andres@anarazel.de> writes:
> On 2019-01-20 18:50:33 -0500, Tom Lane wrote:
>> In [1] I propose that we should allow extensions to get their hands
>> on the ability to transform function calls as per "protransform" and
>> to generate lossy index quals based on items in WHERE clauses.  The
>> APIs to give an extension control at the right points seem pretty
>> straightforward, but there's a problem standing in the way of actually
>> doing anything useful once you've got control.  In order to either
>> analyze a passed-in clause or generate a new one, the extension will
>> need to know the OIDs of the functions/operators it's working with.
>> And extension objects don't have fixed OIDs.

> Why does it need to know all its oids, rather than just the one of the
> operation protransform is called for? Am I missing something?  And if
> so, why isn't it sufficient to just pass in that oid along with the
> node?

You would know that the FuncExpr you're given is for the function the
support function is attached to, sure, and you could pull its OID out
of that if you wanted.  The problem is that what you want to generate
frequently involves *other* functions or operators.

The example Paul gave in the other thread is that given

    ST_DWithin(a, b, radius)

we might wish to generate an indexqual like

    a && expand(b, radius)

Here, the only OID in easy reach is that of ST_DWithin().  The
problem is to find out the OIDs of && and expand() so we can build
new FuncExpr and OpExpr nodes.

> Which range are you thinking of handing out here? We use oid ranges as
> proxy for system-object-ness in a number of places, so we can't really
> just hand out ones below FirstNormalObjectId. And we can't really hand
> out any above that, because there'd be conflicts - even if we increased
> FirstNormalObjectId today, there'd be issues with pg_upgrade.

Yes, I said in so many words that I was proposing increasing
FirstNormalObjectId.  I do not think the issues with pg_upgrade itself
are insoluble --- it would need some historical knowledge about what
FirstNormalObjectId had been in each prior version, but that's a pretty
minor problem in the big scheme of things.  What I'm not seeing a solution
for is how an extension upgrade script could assign fixed OIDs to existing
objects.  Since nobody else seems to either see a way to do that, or
even like the idea of fixed OIDs at all, I'm probably going to push
forward with the OID mapping idea instead.  That'll be a bit more
painful to use, but I don't see any showstopper problems ATM.

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Andres Freund

Date:

21 January 2019, 23:59:58

Hi,

On 2019-01-21 18:52:05 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-01-20 18:50:33 -0500, Tom Lane wrote:
> >> In [1] I propose that we should allow extensions to get their hands
> >> on the ability to transform function calls as per "protransform" and
> >> to generate lossy index quals based on items in WHERE clauses.  The
> >> APIs to give an extension control at the right points seem pretty
> >> straightforward, but there's a problem standing in the way of actually
> >> doing anything useful once you've got control.  In order to either
> >> analyze a passed-in clause or generate a new one, the extension will
> >> need to know the OIDs of the functions/operators it's working with.
> >> And extension objects don't have fixed OIDs.
> 
> > Why does it need to know all its oids, rather than just the one of the
> > operation protransform is called for? Am I missing something?  And if
> > so, why isn't it sufficient to just pass in that oid along with the
> > node?
> 
> You would know that the FuncExpr you're given is for the function the
> support function is attached to, sure, and you could pull its OID out
> of that if you wanted.  The problem is that what you want to generate
> frequently involves *other* functions or operators.
> 
> The example Paul gave in the other thread is that given
> 
>     ST_DWithin(a, b, radius)
> 
> we might wish to generate an indexqual like
> 
>     a && expand(b, radius)
> 
> Here, the only OID in easy reach is that of ST_DWithin().  The
> problem is to find out the OIDs of && and expand() so we can build
> new FuncExpr and OpExpr nodes.

I guess I was imagining we'd not create FuncExprs etc, but now that you
say it, that'd be absurdely invasive.  Thanks.


> > Which range are you thinking of handing out here? We use oid ranges as
> > proxy for system-object-ness in a number of places, so we can't really
> > just hand out ones below FirstNormalObjectId. And we can't really hand
> > out any above that, because there'd be conflicts - even if we increased
> > FirstNormalObjectId today, there'd be issues with pg_upgrade.
> 
> Yes, I said in so many words that I was proposing increasing
> FirstNormalObjectId.  I do not think the issues with pg_upgrade itself
> are insoluble --- it would need some historical knowledge about what
> FirstNormalObjectId had been in each prior version, but that's a pretty
> minor problem in the big scheme of things.

Just about every installation uses the oids directly after
FirstNormalObjectId, so that seems fairly painful. It'd be more
realistic to create a new zone at UINT32_MAX - something, but that'd
likely still conflict in plenty installations (thanks to toast and WITH
OIDS tables).   I'm curious as to how to solve that, if you have a
sketch - less because of this, and more because I think it's not
unlikely that we'll encounter the need for this at some point not too
far away.


> I'm probably going to push forward with the OID mapping idea instead.
> That'll be a bit more painful to use, but I don't see any showstopper
> problems ATM.

Cool.

Greetings,

Andres Freund

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

22 January 2019, 00:41:26

Andres Freund <andres@anarazel.de> writes:
> On 2019-01-21 18:52:05 -0500, Tom Lane wrote:
>> Yes, I said in so many words that I was proposing increasing
>> FirstNormalObjectId.  I do not think the issues with pg_upgrade itself
>> are insoluble --- it would need some historical knowledge about what
>> FirstNormalObjectId had been in each prior version, but that's a pretty
>> minor problem in the big scheme of things.

> Just about every installation uses the oids directly after
> FirstNormalObjectId, so that seems fairly painful.

It would be painful to change the OIDs of objects that pg_upgrade
tries to preserve the OIDs of --- but those are just tables, types,
and roles.  Everything else would get renumbered automatically during
pg_upgrade's dump and reload of the schema.  The point of my proposal
was that having fixed OIDs for those specific object types might not
be necessary for the use-case of generating new FuncExprs and OpExprs.
(You would need to look up some associated types, but those would not
be hard to get.)

An advantage of the OID-mapping proposal is that it can support getting
the OIDs of tables and types too.

> It'd be more
> realistic to create a new zone at UINT32_MAX - something, but that'd
> likely still conflict in plenty installations (thanks to toast and WITH
> OIDS tables).   I'm curious as to how to solve that, if you have a
> sketch - less because of this, and more because I think it's not
> unlikely that we'll encounter the need for this at some point not too
> far away.

I have no idea how we'd move table or type OIDs, given that those are
potentially on-disk.  (Actually ... are table OIDs really on-disk
anywhere in user data?  Types yes, but tables?)

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Andrew Gierth

Date:

22 January 2019, 01:26:58

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> I thought about extending the extension infrastructure to provide
 Tom> some way of retrieving relevant OIDs. We could imagine, for
 Tom> instance, that an extension script has a way to say "this function
 Tom> is object number three within this extension", and while running
 Tom> the script we make a catalog entry showing that object number
 Tom> three has OID thus-and-so, and then that catalog entry can be
 Tom> consulted to get the right OID (by C code that has hard-wired
 Tom> knowledge that object number three is the function it cares
 Tom> about). This is still kind of messy, because aside from the
 Tom> hand-assigned object numbers you'd have to use the extension name
 Tom> as part of the lookup key, making the name into something the C
 Tom> code critically depends on. We don't have ALTER EXTENSION RENAME,
 Tom> so maybe that's okay, but it seems painful to say that we can
 Tom> never have it.

I suggest using string tags rather than object numbers:

1. easier to read and maintain

2. an object might be given many tags, some of them automatically

3. it might make sense to provide a function that the extension can use
to ask "is oid X one of my objects with tag 'foo'" (e.g. to match one of
a set of related functions) in addition to looking up specific oids by
tag

-- 
Andrew (irc:RhodiumToad)

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

22 January 2019, 02:17:48

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> I thought about extending the extension infrastructure to provide
>  Tom> some way of retrieving relevant OIDs. We could imagine, for
>  Tom> instance, that an extension script has a way to say "this function
>  Tom> is object number three within this extension", ...

> I suggest using string tags rather than object numbers:

Meh ... that increases the cost of individual lookups substantially.
The implementation I had in mind was that an extension would do a
once-per-session catalog lookup to fetch an array of OIDs [1], and
then individual operations would just index into that array.  If what
we provide is string tags, the cost per lookup goes up by two or
three orders of magnitude, for benefits that seem pretty hypothetical.

The in-catalog representation gets a lot more complex too, as it can't
just be "oid[]".  (No, I do not wish to hear the word JSON here.)

I don't buy any of your suggested benefits:

> 1. easier to read and maintain

The SQL-level API that I'm imagining would look roughly like
a command like this at the end of an extension's script:

ALTER EXTENSION extname SET MAP
  OBJECT 1 IS FUNCTION foo(int, int),
  OBJECT 2 IS OPERATOR +(float, float), ...

where the parts after "IS" should be able to use the same production
as for "member_object" in ALTER EXTENSION ADD/DROP.  Execution of
this would replace an oid[] field in the extension's pg_extension row.
So yeah, we could replace the numbers by identifiers in this command,
but does that really buy us much maintainability?  Any intelligent
extension author is going to have a C header with something like

#define MY_FUNCTION_FOO_INT_INT 1
#define MY_OPERATOR_PLUS_FLOAT_FLOAT 2

which she has to keep in sync with the ALTER SET MAP in her extension
script.  Using names in the SET MAP just changes the contents of those
#defines to strings, which isn't moving the goalposts much for
maintainability.

> 2. an object might be given many tags, some of them automatically

> 3. it might make sense to provide a function that the extension can use
> to ask "is oid X one of my objects with tag 'foo'" (e.g. to match one of
> a set of related functions) in addition to looking up specific oids by
> tag

I'm not seeing that either of these have actual real-world use cases,
at least not use-cases with the same constraints that the OID mapping
problem has.  In particular, we're going to have to lock down use of
the SET MAP command pretty hard, since the ability to insert a different
object than a support function thought it was calling would be an easy
security hole.  That seems like it lets out many of the potential
applications of the sort of object labeling you're talking about.
(I'd also wonder why such labeling would be needed only for objects
in extensions.)

            regards, tom lane

[1] The possibility of needing to flush that cache complicates this,
but it'd complicate the other thing too.

Re: Allowing extensions to find out the OIDs of their member objects

From

Chapman Flack

Date:

22 January 2019, 02:26:39

On 01/21/19 18:52, Tom Lane wrote:
> I'm probably going to push forward with the OID mapping idea instead.

As it happens, I'd been recently thinking[1] about a way that certain
SQL/XML functionality could be provided by a pluggable selection of
different extensions.

And I think a use case like that could be rather elegantly served by
the OID mapping idea, more so than by a fixed-OID-range-per-extension
approach.

So +1.

-Chap


[1]

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Proposal_2:_desugaring_to_calls_on_normal_extension_functions

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

22 January 2019, 02:45:31

Chapman Flack <chap@anastigmatix.net> writes:
> On 01/21/19 18:52, Tom Lane wrote:
>> I'm probably going to push forward with the OID mapping idea instead.

> As it happens, I'd been recently thinking[1] about a way that certain
> SQL/XML functionality could be provided by a pluggable selection of
> different extensions.
> And I think a use case like that could be rather elegantly served by
> the OID mapping idea, more so than by a fixed-OID-range-per-extension
> approach.

Hm, yeah.  One use-case that's been in the back of my mind is
cross-extension references; for example, what if PostGIS wants
to map a call to one of its own functions into an indexable
operator that's defined by the btree_gist extension?  What you're
talking about, IIUC, is a similar kind of reference only it goes
from the core code to an extension.

This line of thought says that the identifiers exposed by what
I was calling a SET MAP command would soon become part of the
de facto API of an extension: you'd not want to change them
for fear that some other extension was relying on them.

Perhaps this also gives some impetus to the lets-use-identifiers-
not-numbers approach that Andrew was pushing.  I didn't care for
that too much so far as an extension's own internal references
are concerned, but for cross-extension references it seems a
lot better to be looking for "postgis / function_foo_int_int"
than for "postgis / 3".

On the third hand you could also say that such references should
just use name-based lookup and not a facility that's designed to
bypass the expense of that.  Loading additional functionality
onto said facility just reduces its desired speed advantage.

(That is, in the terms of what I think you mean for the SQL/XML
business, an extension that's meant to serve that purpose would be
required to provide functions with specified names and signatures,
and the core would look them up that way rather than with any
behind-the-scenes API.)

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Chapman Flack

Date:

22 January 2019, 03:26:43

On 01/21/19 21:45, Tom Lane wrote:
> are concerned, but for cross-extension references it seems a
> lot better to be looking for "postgis / function_foo_int_int"
> than for "postgis / 3".

I wonder if postgis might offer a .h file with FUNCTION_POSTGIS_FOO_INT_INT
defined as 3, which extensions intending to use foo could be built against.

Any that aren't could still search by name and signature.

In the case of calls from core to some pluggable extension, of course
the .h file would be in core, with the implementing extensions expected
to build against it: in thy extension shalt thou provide XMLCAST at index 1,
XMLITERATE at index 2, etc.

Regards,
-Chap

Re: Allowing extensions to find out the OIDs of their member objects

From

Darafei "Komяpa" Praliaskouski

Date:

22 January 2019, 08:58:42

Thoughts?

I have a feeling this is over-engineering in slightly different direction, solving the way for hack to work instead of original problem.

What's currently happening in PostGIS is that there are functions that need to perform index-based lookups.

Postgres is unable to plan this for functions, only for operators.

Operators have only two sides, some PostGIS functions have arguments - you can't squeeze these into operator.

Well, you can squeeze two of your parameters into one, but it will be ugly too - you'll invent some "query" argument type and alternative grammar instead of SQL (see tsquery).

ST_DWithin itself is also another way of working around planner limitation and squeezing something into both sides of operator, since you don't know which side of your query is going to have an index. It's not perfect either.

A perfect solution will be a way to perform a clean index scan on ST_Distance(a.geom, b.geom) < 10, which is what ST_DWithin is trying to express in limited logic of "you only have two sides of operator".

If you need example from another world: imagine jsonb key-value lookup. It's currently done via

select ... where tags @> '{"highway":"residential"}';

- which is hard: you have to remember which side the rose should lean towards, which {} [] to use, how to quote around json and inside and more.

A more intuitive way for many programmers to write this is similar to this:

select ... where (tags->>'highway') = 'residential';

- but this does not end up with an index lookup.

I'd be happy if we can deprecate ST_DWithin in PostGIS and just allow ST_Distance(a.geom, b.geom) < 10.

ST_Distance is defined in standard as function, however, there is equivalent operator <-> that exists for sole purpose of KNN lookups. So, when you write:

... order by ST_Distance(geom, 'your_position')
- you're not getting index scan, and when writing

... order by geom <-> 'your_position'

- you're getting index scan but not doing a thing you may intuitively write by knowing ST_Distance is standard-defined way to measure distance between two spatial objects.

May it happen to direct you to some other thoughts?

Darafei Praliaskouski

Support me: http://patreon.com/komzpa

Re: Allowing extensions to find out the OIDs of their member objects

From

Andrew Gierth

Date:

22 January 2019, 11:05:58

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> 1. easier to read and maintain

 Tom> The SQL-level API that I'm imagining would look roughly like
 Tom> a command like this at the end of an extension's script:

 Tom> ALTER EXTENSION extname SET MAP
 Tom>   OBJECT 1 IS FUNCTION foo(int, int),
 Tom>   OBJECT 2 IS OPERATOR +(float, float), ...

That's what I thought and I had something similar in mind except not
with numbers.

This is obviously the same situation we have with operator and function
numbers in opclasses right now, which is something I personally find
annoying: the fact that (for example) GiST operator members are assigned
some non-self-documenting number that you can only resolve by looking at
the opclass implementation to find out what it thinks the numbers mean.

-- 
Andrew (irc:RhodiumToad)

Re: Allowing extensions to find out the OIDs of their member objects

From

Andres Freund

Date:

22 January 2019, 19:09:33

Hi,

On 2019-01-21 19:41:26 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It'd be more
> > realistic to create a new zone at UINT32_MAX - something, but that'd
> > likely still conflict in plenty installations (thanks to toast and WITH
> > OIDS tables).   I'm curious as to how to solve that, if you have a
> > sketch - less because of this, and more because I think it's not
> > unlikely that we'll encounter the need for this at some point not too
> > far away.
> 
> I have no idea how we'd move table or type OIDs, given that those are
> potentially on-disk.  (Actually ... are table OIDs really on-disk
> anywhere in user data?  Types yes, but tables?)

Not quite the same, but toast table oids are on-disk, inside toast
datums.

Greetings,

Andres Freund

Re: Allowing extensions to find out the OIDs of their member objects

From

Tom Lane

Date:

19 February 2019, 15:45:07

I wrote:
> [ discussion about ways to let extension C code find out object OIDs ]

I wanted to close out this thread, for the time being, by saying that
I'm not expecting to get anything done about it for v12.  It seems
pretty late in the dev cycle to be proposing any major new user-visible
functionality, and even without that consideration, I have too many
other things on my plate.

There are ways that an extension can implement
SupportRequestIndexCondition without knowing hard-wired OIDs for its
objects:

1. The first problem is to determine whether the index column you've
been called for has the opfamily you want to work with.  Our core-code
examples mostly check req->opfamily against a hard-wired OID constant,
which doesn't work for an extension-defined opfamily.  I think best
practice here will probably be to look up the opfamily's catalog entry via
the OPFAMILYOID syscache and see if its name is what you expect (checking
only the name, not the schema, since the latter might be variable).
While a false match is theoretically possible it seems unlikely, and
there's not a security risk since we only allow superusers to create
opfamilies.  Another idea is to check req->index->relam to see if the
index type is what you expect, and just assume that your datatype has
only one opfamily per index type.  This is mainly attractive if the
index type is a built-in one with a known OID --- if the index AM is
also extension-defined, then you're still stuck needing to look up and
check its name.

2. Then, you need to be able to identify operator OIDs so that you can
build indexqual OpExprs.  The operators you care about for this have
to be members of the opfamily, so you can look up their OIDs pretty
easily using get_opfamily_member ... assuming you know the OIDs of their
input datatypes (see below).  like_support.c has examples of this.

3.  You might need to determine datatype OIDs, for the above purpose and
so that you can build comparison constants to pass to the operators.
This is pretty easy if the comparison value is of the same type as one
of the inputs to your function or operator being optimized: just apply
exprType() to that input expression.  However there are cases where this
isn't true; for instance, PostGIS sometimes wants to optimize a function
with more than 2 inputs by converting "func(indexcol, something, something)"
to "indexcol indexable-operator row(something, something)::compositetype".
The OID of the custom composite type isn't readily available.  In such
cases you might have no really better answer than to look up the type
by name.  This can probably be done safely by assuming that it's in the
same schema as your target function, which you can look up since you
have the function's OID.  Using the type OID successfully in a later
get_opfamily_member lookup would provide additional confidence that
you have the right type.

In all of these cases, you could probably get away with caching the
results of successful lookups in static variables, so that you don't
need to do them more than once per session.

This is clearly an area that's ripe for improvement by providing better
infrastructure, but I think we can tolerate this state of affairs for
the time being.

            regards, tom lane

Re: Allowing extensions to find out the OIDs of their member objects

From

Robert Haas

Date:

19 February 2019, 16:16:49

On Mon, Jan 21, 2019 at 9:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Perhaps this also gives some impetus to the lets-use-identifiers-
> not-numbers approach that Andrew was pushing.  I didn't care for
> that too much so far as an extension's own internal references
> are concerned, but for cross-extension references it seems a
> lot better to be looking for "postgis / function_foo_int_int"
> than for "postgis / 3".

Yeah, I agree.  I think names are a good idea.  I also agree with the
other comments that trying to run an OID registry will not work out
well.  Either we'll accept every request for an OID range and go nuts
tracking them all as they rapidly balloon -- or more likely we'll
reject requests from insufficiently-famous extensions which will, of
course, hinder their attempts to become famous.  It seems much better
to come up with a solution where every extension can DTRT without any
central coordination.  Perhaps if we replaced OIDs with UUIDs that
would Just Work, but an OID-mapping system seems like a good, perhaps
better, answer as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company