Thread: Allowing extensions to find out the OIDs of their member objects
In [1] I propose that we should allow extensions to get their hands on the ability to transform function calls as per "protransform" and to generate lossy index quals based on items in WHERE clauses. The APIs to give an extension control at the right points seem pretty straightforward, but there's a problem standing in the way of actually doing anything useful once you've got control. In order to either analyze a passed-in clause or generate a new one, the extension will need to know the OIDs of the functions/operators it's working with. And extension objects don't have fixed OIDs. In principle this could be dealt with by looking up said OIDs from the catalogs, but that's (1) complicated, (2) slow, (3) prone to possibly-security-sensitive mistakes such as omitting a schema specification, and (4) at risk of getting broken entirely by user-accessible changes such as ALTER FUNCTION RENAME. Point (2) can be alleviated by caching, but that just makes (1) even worse, plus there are lots of ways to do caching wrong. I thought about extending the extension infrastructure to provide some way of retrieving relevant OIDs. We could imagine, for instance, that an extension script has a way to say "this function is object number three within this extension", and while running the script we make a catalog entry showing that object number three has OID thus-and-so, and then that catalog entry can be consulted to get the right OID (by C code that has hard-wired knowledge that object number three is the function it cares about). This is still kind of messy, because aside from the hand-assigned object numbers you'd have to use the extension name as part of the lookup key, making the name into something the C code critically depends on. We don't have ALTER EXTENSION RENAME, so maybe that's okay, but it seems painful to say that we can never have it. In the end it seems like possibly the cleanest answer is to change things so that extensions *can* have fixed OIDs that their C code can know, eliminating lookup costs and allowing coding conventions for this sort of work to be the same as in the core backend. We could raise FirstNormalObjectId to create some unused OID space that we could then assign chunks of to specific extensions on-request. This is problematic for relations, types, and roles, because pg_upgrade wants to preserve OIDs of those objects across upgrades, so we could not ensure that the "unused" space is free of such objects. But it would work for all other object types, and I think that it might well be sufficient if an extension can have fixed OIDs for its functions, operators, and opfamilies/opclasses. (At need, it could find out the OID for one of its types by looking up the argument or result types for one of its functions.) There are various places in pg_upgrade and postgres_fdw that assume more than they perhaps should about the significance of FirstNormalObjectId, but I think that that could be dealt with. A larger issue is whether "hand out some OIDs on-demand" is a sustainable strategy. I think that it is, if we encourage extensions to assign fixed OIDs only to objects they really need to. In thirty-ish years of core PG development, we've only used up ~4200 fixed OIDs, and a lot of those are for functions that probably don't really need fixed OIDs but got one because we give one to every built-in function. However, if there's a big land rush to claim large chunks of OIDs, we might have a problem. We'd have to invent some SQL syntax whereby extension scripts can actually apply their assigned OIDs to their objects. I'm not very enthused about adding an "OID nnn" option to every type of CREATE command that might need this. A quick-and-dirty answer is to create support functions similar to binary_upgrade_set_next_pg_type_oid() that set the OID to give to the next-created object of each category we need to support. There are various issues and bits of work around this, but the only one that I've thought of that I haven't got an answer for is "how should an extension upgrade script assign a fixed OID to an object that already existed in the previous extension version, but without a fixed OID?". We can't just change the recorded OID because that'll break dependencies, view references, etc. Conceivably we could write code that runs through the catalogs and changes all references, but man that'd be a mess. Anyone have another idea? Another question is whether you need any special permissions to assign a fixed OID in this way. The most conservative answer is to require superuserness, which might be enough because the plausible use-cases for fixed OIDs involve C code, which you'd need to be superuser to install anyhow. But it seems overkill somehow. OTOH, it would be annoying if a random user could eat up a "reserved" OID that later prevented someone from installing an extension they wanted. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/15193.1548028093@sss.pgh.pa.us
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> A larger issue is whether "hand out some OIDs on-demand" is a Tom> sustainable strategy. No. Not for any concerns over availability of oids, but simply from the fact that we have no business whatsoever inserting ourselves into the extension development process in this way. -- Andrew (irc:RhodiumToad)
On 01/20/19 18:50, Tom Lane wrote: > we make a catalog entry showing that object number three has OID > thus-and-so, and then that catalog entry can be consulted to get > the right OID (by C code that has hard-wired knowledge that object > number three is the function it cares about). This is still kind > of messy, because aside from the hand-assigned object numbers > you'd have to use the extension name as part of the lookup key, > making the name into something the C code critically depends on. > We don't have ALTER EXTENSION RENAME, so maybe that's okay, but > it seems painful to say that we can never have it. An extension *has* an OID, doesn't it? pg_extension has 'em. If the extension script could somehow be informed at CREATE EXTENSION time of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no? Somehow, I find this first idea more aesthetically appealing than actually trying to bind things in extensions to fixed OIDs for all time. -Chap
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> A larger issue is whether "hand out some OIDs on-demand" is a > Tom> sustainable strategy. > No. > Not for any concerns over availability of oids, but simply from the fact > that we have no business whatsoever inserting ourselves into the > extension development process in this way. I'm not exactly following this concern. I wasn't imagining that we'd assign each individual OID ourselves, but rather give out blocks of OIDs. Admittedly, the blocks can't be huge, but it doesn't seem to me that this'd create an impossible burden for either us or extension developers. We could also reserve some range of OIDs for "local extensions", whereby people who didn't intend to publish their extensions for widespread use could just use some of those OIDs rather than having to ask for a public assignment. regards, tom lane
Chapman Flack <chap@anastigmatix.net> writes: > On 01/20/19 18:50, Tom Lane wrote: >> we make a catalog entry showing that object number three has OID >> thus-and-so, and then that catalog entry can be consulted to get >> the right OID (by C code that has hard-wired knowledge that object >> number three is the function it cares about). This is still kind >> of messy, because aside from the hand-assigned object numbers >> you'd have to use the extension name as part of the lookup key, >> making the name into something the C code critically depends on. >> We don't have ALTER EXTENSION RENAME, so maybe that's okay, but >> it seems painful to say that we can never have it. > An extension *has* an OID, doesn't it? pg_extension has 'em. Sure. > If the extension script could somehow be informed at CREATE EXTENSION time > of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no? And it remembers that where? > Somehow, I find this first idea more aesthetically appealing than > actually trying to bind things in extensions to fixed OIDs for all time. I don't find it appealing particularly, but at least it hasn't got any insurmountable-looking problems --- other than the "you can't rename your extension" one. If we can't make the fixed-OIDs approach work, this might be a workable second choice. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> I'm not exactly following this concern. I wasn't imagining that Tom> we'd assign each individual OID ourselves, but rather give out Tom> blocks of OIDs. Admittedly, the blocks can't be huge, but it Tom> doesn't seem to me that this'd create an impossible burden for Tom> either us or extension developers. Even that's not acceptable. There is no reason why someone should not be able to create extensions freely without us ever knowing about them or needing to. In fact I suggest that "there shall be no registries of third parties" be made a formal project policy. Tom> We could also reserve some range of OIDs for "local extensions", Tom> whereby people who didn't intend to publish their extensions for Tom> widespread use could just use some of those OIDs rather than Tom> having to ask for a public assignment. That's not acceptable either; local extensions have a way of becoming global. Seriously, this whole idea is a lazy hack. Fixed assignments? really? -- Andrew (irc:RhodiumToad)
On 01/20/19 19:43, Tom Lane wrote: >> If the extension script could somehow be informed at CREATE EXTENSION time >> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no? > > And it remembers that where? Top level answer: up to the extension author. Next level answer: maybe not all extensions have libraries to load, but for those that do (a good portion), wouldn't it be convenient for _PG_init() to get the value, either passed as an argument, or through some API? After the extension is created, loading of the library is going to be occasioned through the call of some function, right? That function just got looked up, and it has an 'e' dependency recorded on the extension, giving the extension OID. I can't judge whether that's too much lookup action for the library load machinery to be doing; how frequent are library loads, and how much would that add to the cycles they already require? Regards, -Chap
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > In fact I suggest that "there shall be no registries of third parties" > be made a formal project policy. You're a decade or two too late for that; see pg_statistic.h. In any case, it's not like this issue applies to every extension anybody might want to make. Only quite advanced extensions would have any need for the features that known-at-compile-time OIDs would help with, as shown by the fact that everyone's gotten by without them so far. And people who have a reason to fly under the radar could always stick with the method of doing object-name-based runtime lookups. I also note that multiple people have asked for extensions to have stable OIDs for other reasons. Unfortunately, the most common reason is so that client apps could hard-wire knowledge about type OIDs they see in query results, and my proposal excludes being able to do that :-(. But it's not like nobody has wanted publicly-assigned OIDs before. There may well be good technical reasons why we shouldn't go this route (the extension upgrade problem in particular). But your objection seems basically political and I reject it as a valid argument. > Seriously, this whole idea is a lazy hack. Fixed assignments? really? Hardly lazy. It's the most difficult approach (from our standpoint) of the three I mentioned; but the flip side of that is it takes the least work, and produces the most efficient code, for extension developers. regards, tom lane
Chapman Flack <chap@anastigmatix.net> writes: > On 01/20/19 19:43, Tom Lane wrote: >>> If the extension script could somehow be informed at CREATE EXTENSION time >>> of what its OID is, that would clear the way for ALTER EXTENSION RENAME, no? >> And it remembers that where? > Top level answer: up to the extension author. That isn't an answer, it's an admission of failure. > Next level answer: maybe not all extensions have libraries to load, > but for those that do (a good portion), wouldn't it be convenient > for _PG_init() to get the value, either passed as an argument, or > through some API? There's no hard connection between libraries and extensions though. In fact, right now there's no connection at all. > After the extension is created, loading of the library is going to be > occasioned through the call of some function, right? There's LOAD, and there's also the possibility that the library supports multiple extensions, or that some of its functions don't belong to an extension. A notable problem here is that at the point where the library (probably) first gets loaded during CREATE EXTENSION, the extension doesn't exist yet; or even if it does, the pg_depend entry linking the to-be-created function certainly doesn't. It is possible that an extension function could chase its 'e' dependency when called (*not* when being loaded) to find out the OID of its extension, but frankly I don't see typical extension authors doing that, even if it didn't have failure cases. Actually, behavior during CREATE EXTENSION seems like a bit of a problem for the whole mapping idea --- partway through the script, you'd certainly not know all the object IDs, so there would not be a complete map available if one of the extension's functions gets called during that script. This could be worked around, but it makes things more complicated for extension authors than I'd envisioned at first. They can't just assume that they have all the object OIDs available. For the use-cases described so far, it seems like it'd be OK to just fall back to doing nothing if the map isn't ready yet, but we'd have to keep that restriction in mind while defining future call scenarios. regards, tom lane
On Mon, Jan 21, 2019 at 12:25:16AM +0000, Andrew Gierth wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > Tom> A larger issue is whether "hand out some OIDs on-demand" is a > Tom> sustainable strategy. > > No. > > Not for any concerns over availability of oids, but simply from the fact > that we have no business whatsoever inserting ourselves into the > extension development process in this way. +1 for keeping our nose out of this business. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sun, Jan 20, 2019 at 06:50:33PM -0500, Tom Lane wrote: > A larger issue is whether "hand out some OIDs on-demand" is a > sustainable strategy. I think that it is, if we encourage extensions > to assign fixed OIDs only to objects they really need to. In thirty-ish > years of core PG development, we've only used up ~4200 fixed OIDs, > and a lot of those are for functions that probably don't really need > fixed OIDs but got one because we give one to every built-in function. > However, if there's a big land rush to claim large chunks of OIDs, > we might have a problem. Hm. Such things are a bit concerning. There are many closed and open extensions, so it looks hard to not create conflicts between multiple extensions trying to get the same range of OIDs or even the same OIDs and users willing to combine some of them. This could mess up the user experience. -- Michael
Attachment
Hi, On 2019-01-20 18:50:33 -0500, Tom Lane wrote: > In [1] I propose that we should allow extensions to get their hands > on the ability to transform function calls as per "protransform" and > to generate lossy index quals based on items in WHERE clauses. The > APIs to give an extension control at the right points seem pretty > straightforward, but there's a problem standing in the way of actually > doing anything useful once you've got control. In order to either > analyze a passed-in clause or generate a new one, the extension will > need to know the OIDs of the functions/operators it's working with. > And extension objects don't have fixed OIDs. Why does it need to know all its oids, rather than just the one of the operation protransform is called for? Am I missing something? And if so, why isn't it sufficient to just pass in that oid along with the node? > A larger issue is whether "hand out some OIDs on-demand" is a > sustainable strategy. I think that it is, if we encourage extensions > to assign fixed OIDs only to objects they really need to. In thirty-ish > years of core PG development, we've only used up ~4200 fixed OIDs, > and a lot of those are for functions that probably don't really need > fixed OIDs but got one because we give one to every built-in function. > However, if there's a big land rush to claim large chunks of OIDs, > we might have a problem. I'm not sure that "30 years" argument holds that much power - we've surely reused oids. And core PG is going to be much more conservative than any sort of external user. Which range are you thinking of handing out here? We use oid ranges as proxy for system-object-ness in a number of places, so we can't really just hand out ones below FirstNormalObjectId. And we can't really hand out any above that, because there'd be conflicts - even if we increased FirstNormalObjectId today, there'd be issues with pg_upgrade. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-01-20 18:50:33 -0500, Tom Lane wrote: >> In [1] I propose that we should allow extensions to get their hands >> on the ability to transform function calls as per "protransform" and >> to generate lossy index quals based on items in WHERE clauses. The >> APIs to give an extension control at the right points seem pretty >> straightforward, but there's a problem standing in the way of actually >> doing anything useful once you've got control. In order to either >> analyze a passed-in clause or generate a new one, the extension will >> need to know the OIDs of the functions/operators it's working with. >> And extension objects don't have fixed OIDs. > Why does it need to know all its oids, rather than just the one of the > operation protransform is called for? Am I missing something? And if > so, why isn't it sufficient to just pass in that oid along with the > node? You would know that the FuncExpr you're given is for the function the support function is attached to, sure, and you could pull its OID out of that if you wanted. The problem is that what you want to generate frequently involves *other* functions or operators. The example Paul gave in the other thread is that given ST_DWithin(a, b, radius) we might wish to generate an indexqual like a && expand(b, radius) Here, the only OID in easy reach is that of ST_DWithin(). The problem is to find out the OIDs of && and expand() so we can build new FuncExpr and OpExpr nodes. > Which range are you thinking of handing out here? We use oid ranges as > proxy for system-object-ness in a number of places, so we can't really > just hand out ones below FirstNormalObjectId. And we can't really hand > out any above that, because there'd be conflicts - even if we increased > FirstNormalObjectId today, there'd be issues with pg_upgrade. Yes, I said in so many words that I was proposing increasing FirstNormalObjectId. I do not think the issues with pg_upgrade itself are insoluble --- it would need some historical knowledge about what FirstNormalObjectId had been in each prior version, but that's a pretty minor problem in the big scheme of things. What I'm not seeing a solution for is how an extension upgrade script could assign fixed OIDs to existing objects. Since nobody else seems to either see a way to do that, or even like the idea of fixed OIDs at all, I'm probably going to push forward with the OID mapping idea instead. That'll be a bit more painful to use, but I don't see any showstopper problems ATM. regards, tom lane
Hi, On 2019-01-21 18:52:05 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2019-01-20 18:50:33 -0500, Tom Lane wrote: > >> In [1] I propose that we should allow extensions to get their hands > >> on the ability to transform function calls as per "protransform" and > >> to generate lossy index quals based on items in WHERE clauses. The > >> APIs to give an extension control at the right points seem pretty > >> straightforward, but there's a problem standing in the way of actually > >> doing anything useful once you've got control. In order to either > >> analyze a passed-in clause or generate a new one, the extension will > >> need to know the OIDs of the functions/operators it's working with. > >> And extension objects don't have fixed OIDs. > > > Why does it need to know all its oids, rather than just the one of the > > operation protransform is called for? Am I missing something? And if > > so, why isn't it sufficient to just pass in that oid along with the > > node? > > You would know that the FuncExpr you're given is for the function the > support function is attached to, sure, and you could pull its OID out > of that if you wanted. The problem is that what you want to generate > frequently involves *other* functions or operators. > > The example Paul gave in the other thread is that given > > ST_DWithin(a, b, radius) > > we might wish to generate an indexqual like > > a && expand(b, radius) > > Here, the only OID in easy reach is that of ST_DWithin(). The > problem is to find out the OIDs of && and expand() so we can build > new FuncExpr and OpExpr nodes. I guess I was imagining we'd not create FuncExprs etc, but now that you say it, that'd be absurdely invasive. Thanks. > > Which range are you thinking of handing out here? We use oid ranges as > > proxy for system-object-ness in a number of places, so we can't really > > just hand out ones below FirstNormalObjectId. And we can't really hand > > out any above that, because there'd be conflicts - even if we increased > > FirstNormalObjectId today, there'd be issues with pg_upgrade. > > Yes, I said in so many words that I was proposing increasing > FirstNormalObjectId. I do not think the issues with pg_upgrade itself > are insoluble --- it would need some historical knowledge about what > FirstNormalObjectId had been in each prior version, but that's a pretty > minor problem in the big scheme of things. Just about every installation uses the oids directly after FirstNormalObjectId, so that seems fairly painful. It'd be more realistic to create a new zone at UINT32_MAX - something, but that'd likely still conflict in plenty installations (thanks to toast and WITH OIDS tables). I'm curious as to how to solve that, if you have a sketch - less because of this, and more because I think it's not unlikely that we'll encounter the need for this at some point not too far away. > I'm probably going to push forward with the OID mapping idea instead. > That'll be a bit more painful to use, but I don't see any showstopper > problems ATM. Cool. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-01-21 18:52:05 -0500, Tom Lane wrote: >> Yes, I said in so many words that I was proposing increasing >> FirstNormalObjectId. I do not think the issues with pg_upgrade itself >> are insoluble --- it would need some historical knowledge about what >> FirstNormalObjectId had been in each prior version, but that's a pretty >> minor problem in the big scheme of things. > Just about every installation uses the oids directly after > FirstNormalObjectId, so that seems fairly painful. It would be painful to change the OIDs of objects that pg_upgrade tries to preserve the OIDs of --- but those are just tables, types, and roles. Everything else would get renumbered automatically during pg_upgrade's dump and reload of the schema. The point of my proposal was that having fixed OIDs for those specific object types might not be necessary for the use-case of generating new FuncExprs and OpExprs. (You would need to look up some associated types, but those would not be hard to get.) An advantage of the OID-mapping proposal is that it can support getting the OIDs of tables and types too. > It'd be more > realistic to create a new zone at UINT32_MAX - something, but that'd > likely still conflict in plenty installations (thanks to toast and WITH > OIDS tables). I'm curious as to how to solve that, if you have a > sketch - less because of this, and more because I think it's not > unlikely that we'll encounter the need for this at some point not too > far away. I have no idea how we'd move table or type OIDs, given that those are potentially on-disk. (Actually ... are table OIDs really on-disk anywhere in user data? Types yes, but tables?) regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> I thought about extending the extension infrastructure to provide Tom> some way of retrieving relevant OIDs. We could imagine, for Tom> instance, that an extension script has a way to say "this function Tom> is object number three within this extension", and while running Tom> the script we make a catalog entry showing that object number Tom> three has OID thus-and-so, and then that catalog entry can be Tom> consulted to get the right OID (by C code that has hard-wired Tom> knowledge that object number three is the function it cares Tom> about). This is still kind of messy, because aside from the Tom> hand-assigned object numbers you'd have to use the extension name Tom> as part of the lookup key, making the name into something the C Tom> code critically depends on. We don't have ALTER EXTENSION RENAME, Tom> so maybe that's okay, but it seems painful to say that we can Tom> never have it. I suggest using string tags rather than object numbers: 1. easier to read and maintain 2. an object might be given many tags, some of them automatically 3. it might make sense to provide a function that the extension can use to ask "is oid X one of my objects with tag 'foo'" (e.g. to match one of a set of related functions) in addition to looking up specific oids by tag -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > Tom> I thought about extending the extension infrastructure to provide > Tom> some way of retrieving relevant OIDs. We could imagine, for > Tom> instance, that an extension script has a way to say "this function > Tom> is object number three within this extension", ... > I suggest using string tags rather than object numbers: Meh ... that increases the cost of individual lookups substantially. The implementation I had in mind was that an extension would do a once-per-session catalog lookup to fetch an array of OIDs [1], and then individual operations would just index into that array. If what we provide is string tags, the cost per lookup goes up by two or three orders of magnitude, for benefits that seem pretty hypothetical. The in-catalog representation gets a lot more complex too, as it can't just be "oid[]". (No, I do not wish to hear the word JSON here.) I don't buy any of your suggested benefits: > 1. easier to read and maintain The SQL-level API that I'm imagining would look roughly like a command like this at the end of an extension's script: ALTER EXTENSION extname SET MAP OBJECT 1 IS FUNCTION foo(int, int), OBJECT 2 IS OPERATOR +(float, float), ... where the parts after "IS" should be able to use the same production as for "member_object" in ALTER EXTENSION ADD/DROP. Execution of this would replace an oid[] field in the extension's pg_extension row. So yeah, we could replace the numbers by identifiers in this command, but does that really buy us much maintainability? Any intelligent extension author is going to have a C header with something like #define MY_FUNCTION_FOO_INT_INT 1 #define MY_OPERATOR_PLUS_FLOAT_FLOAT 2 which she has to keep in sync with the ALTER SET MAP in her extension script. Using names in the SET MAP just changes the contents of those #defines to strings, which isn't moving the goalposts much for maintainability. > 2. an object might be given many tags, some of them automatically > 3. it might make sense to provide a function that the extension can use > to ask "is oid X one of my objects with tag 'foo'" (e.g. to match one of > a set of related functions) in addition to looking up specific oids by > tag I'm not seeing that either of these have actual real-world use cases, at least not use-cases with the same constraints that the OID mapping problem has. In particular, we're going to have to lock down use of the SET MAP command pretty hard, since the ability to insert a different object than a support function thought it was calling would be an easy security hole. That seems like it lets out many of the potential applications of the sort of object labeling you're talking about. (I'd also wonder why such labeling would be needed only for objects in extensions.) regards, tom lane [1] The possibility of needing to flush that cache complicates this, but it'd complicate the other thing too.
On 01/21/19 18:52, Tom Lane wrote: > I'm probably going to push forward with the OID mapping idea instead. As it happens, I'd been recently thinking[1] about a way that certain SQL/XML functionality could be provided by a pluggable selection of different extensions. And I think a use case like that could be rather elegantly served by the OID mapping idea, more so than by a fixed-OID-range-per-extension approach. So +1. -Chap [1] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Proposal_2:_desugaring_to_calls_on_normal_extension_functions
Chapman Flack <chap@anastigmatix.net> writes: > On 01/21/19 18:52, Tom Lane wrote: >> I'm probably going to push forward with the OID mapping idea instead. > As it happens, I'd been recently thinking[1] about a way that certain > SQL/XML functionality could be provided by a pluggable selection of > different extensions. > And I think a use case like that could be rather elegantly served by > the OID mapping idea, more so than by a fixed-OID-range-per-extension > approach. Hm, yeah. One use-case that's been in the back of my mind is cross-extension references; for example, what if PostGIS wants to map a call to one of its own functions into an indexable operator that's defined by the btree_gist extension? What you're talking about, IIUC, is a similar kind of reference only it goes from the core code to an extension. This line of thought says that the identifiers exposed by what I was calling a SET MAP command would soon become part of the de facto API of an extension: you'd not want to change them for fear that some other extension was relying on them. Perhaps this also gives some impetus to the lets-use-identifiers- not-numbers approach that Andrew was pushing. I didn't care for that too much so far as an extension's own internal references are concerned, but for cross-extension references it seems a lot better to be looking for "postgis / function_foo_int_int" than for "postgis / 3". On the third hand you could also say that such references should just use name-based lookup and not a facility that's designed to bypass the expense of that. Loading additional functionality onto said facility just reduces its desired speed advantage. (That is, in the terms of what I think you mean for the SQL/XML business, an extension that's meant to serve that purpose would be required to provide functions with specified names and signatures, and the core would look them up that way rather than with any behind-the-scenes API.) regards, tom lane
On 01/21/19 21:45, Tom Lane wrote: > are concerned, but for cross-extension references it seems a > lot better to be looking for "postgis / function_foo_int_int" > than for "postgis / 3". I wonder if postgis might offer a .h file with FUNCTION_POSTGIS_FOO_INT_INT defined as 3, which extensions intending to use foo could be built against. Any that aren't could still search by name and signature. In the case of calls from core to some pluggable extension, of course the .h file would be in core, with the implementing extensions expected to build against it: in thy extension shalt thou provide XMLCAST at index 1, XMLITERATE at index 2, etc. Regards, -Chap
Re: Allowing extensions to find out the OIDs of their member objects
From
Darafei "Komяpa" Praliaskouski
Date:
Thoughts?
I have a feeling this is over-engineering in slightly different direction, solving the way for hack to work instead of original problem.
What's currently happening in PostGIS is that there are functions that need to perform index-based lookups.
Postgres is unable to plan this for functions, only for operators.
Operators have only two sides, some PostGIS functions have arguments - you can't squeeze these into operator.
Well, you can squeeze two of your parameters into one, but it will be ugly too - you'll invent some "query" argument type and alternative grammar instead of SQL (see tsquery).
ST_DWithin itself is also another way of working around planner limitation and squeezing something into both sides of operator, since you don't know which side of your query is going to have an index. It's not perfect either.
A perfect solution will be a way to perform a clean index scan on ST_Distance(a.geom, b.geom) < 10, which is what ST_DWithin is trying to express in limited logic of "you only have two sides of operator".
If you need example from another world: imagine jsonb key-value lookup. It's currently done via
select ... where tags @> '{"highway":"residential"}';
- which is hard: you have to remember which side the rose should lean towards, which {} [] to use, how to quote around json and inside and more.
A more intuitive way for many programmers to write this is similar to this:
- which is hard: you have to remember which side the rose should lean towards, which {} [] to use, how to quote around json and inside and more.
A more intuitive way for many programmers to write this is similar to this:
select ... where (tags->>'highway') = 'residential';
- but this does not end up with an index lookup.
- but this does not end up with an index lookup.
I'd be happy if we can deprecate ST_DWithin in PostGIS and just allow ST_Distance(a.geom, b.geom) < 10.
ST_Distance is defined in standard as function, however, there is equivalent operator <-> that exists for sole purpose of KNN lookups. So, when you write:
... order by ST_Distance(geom, 'your_position')
- you're not getting index scan, and when writing
... order by geom <-> 'your_position'
- you're getting index scan but not doing a thing you may intuitively write by knowing ST_Distance is standard-defined way to measure distance between two spatial objects.
May it happen to direct you to some other thoughts?
- you're not getting index scan, and when writing
... order by geom <-> 'your_position'
- you're getting index scan but not doing a thing you may intuitively write by knowing ST_Distance is standard-defined way to measure distance between two spatial objects.
May it happen to direct you to some other thoughts?
Darafei Praliaskouski
Support me: http://patreon.com/komzpa
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> 1. easier to read and maintain Tom> The SQL-level API that I'm imagining would look roughly like Tom> a command like this at the end of an extension's script: Tom> ALTER EXTENSION extname SET MAP Tom> OBJECT 1 IS FUNCTION foo(int, int), Tom> OBJECT 2 IS OPERATOR +(float, float), ... That's what I thought and I had something similar in mind except not with numbers. This is obviously the same situation we have with operator and function numbers in opclasses right now, which is something I personally find annoying: the fact that (for example) GiST operator members are assigned some non-self-documenting number that you can only resolve by looking at the opclass implementation to find out what it thinks the numbers mean. -- Andrew (irc:RhodiumToad)
Hi, On 2019-01-21 19:41:26 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > It'd be more > > realistic to create a new zone at UINT32_MAX - something, but that'd > > likely still conflict in plenty installations (thanks to toast and WITH > > OIDS tables). I'm curious as to how to solve that, if you have a > > sketch - less because of this, and more because I think it's not > > unlikely that we'll encounter the need for this at some point not too > > far away. > > I have no idea how we'd move table or type OIDs, given that those are > potentially on-disk. (Actually ... are table OIDs really on-disk > anywhere in user data? Types yes, but tables?) Not quite the same, but toast table oids are on-disk, inside toast datums. Greetings, Andres Freund
I wrote: > [ discussion about ways to let extension C code find out object OIDs ] I wanted to close out this thread, for the time being, by saying that I'm not expecting to get anything done about it for v12. It seems pretty late in the dev cycle to be proposing any major new user-visible functionality, and even without that consideration, I have too many other things on my plate. There are ways that an extension can implement SupportRequestIndexCondition without knowing hard-wired OIDs for its objects: 1. The first problem is to determine whether the index column you've been called for has the opfamily you want to work with. Our core-code examples mostly check req->opfamily against a hard-wired OID constant, which doesn't work for an extension-defined opfamily. I think best practice here will probably be to look up the opfamily's catalog entry via the OPFAMILYOID syscache and see if its name is what you expect (checking only the name, not the schema, since the latter might be variable). While a false match is theoretically possible it seems unlikely, and there's not a security risk since we only allow superusers to create opfamilies. Another idea is to check req->index->relam to see if the index type is what you expect, and just assume that your datatype has only one opfamily per index type. This is mainly attractive if the index type is a built-in one with a known OID --- if the index AM is also extension-defined, then you're still stuck needing to look up and check its name. 2. Then, you need to be able to identify operator OIDs so that you can build indexqual OpExprs. The operators you care about for this have to be members of the opfamily, so you can look up their OIDs pretty easily using get_opfamily_member ... assuming you know the OIDs of their input datatypes (see below). like_support.c has examples of this. 3. You might need to determine datatype OIDs, for the above purpose and so that you can build comparison constants to pass to the operators. This is pretty easy if the comparison value is of the same type as one of the inputs to your function or operator being optimized: just apply exprType() to that input expression. However there are cases where this isn't true; for instance, PostGIS sometimes wants to optimize a function with more than 2 inputs by converting "func(indexcol, something, something)" to "indexcol indexable-operator row(something, something)::compositetype". The OID of the custom composite type isn't readily available. In such cases you might have no really better answer than to look up the type by name. This can probably be done safely by assuming that it's in the same schema as your target function, which you can look up since you have the function's OID. Using the type OID successfully in a later get_opfamily_member lookup would provide additional confidence that you have the right type. In all of these cases, you could probably get away with caching the results of successful lookups in static variables, so that you don't need to do them more than once per session. This is clearly an area that's ripe for improvement by providing better infrastructure, but I think we can tolerate this state of affairs for the time being. regards, tom lane
On Mon, Jan 21, 2019 at 9:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Perhaps this also gives some impetus to the lets-use-identifiers- > not-numbers approach that Andrew was pushing. I didn't care for > that too much so far as an extension's own internal references > are concerned, but for cross-extension references it seems a > lot better to be looking for "postgis / function_foo_int_int" > than for "postgis / 3". Yeah, I agree. I think names are a good idea. I also agree with the other comments that trying to run an OID registry will not work out well. Either we'll accept every request for an OID range and go nuts tracking them all as they rapidly balloon -- or more likely we'll reject requests from insufficiently-famous extensions which will, of course, hinder their attempts to become famous. It seems much better to come up with a solution where every extension can DTRT without any central coordination. Perhaps if we replaced OIDs with UUIDs that would Just Work, but an OID-mapping system seems like a good, perhaps better, answer as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company