Thread: Re: Add Postgres module info
Andrei Lepikhov <lepihov@gmail.com> writes: > I would like to propose the module_info structure, which aims to let > extension maintainers sew some data into the binary file. Being included > in the module code, this information remains unchanged and is available > for reading by a backend. I don't have much of an opinion one way or the other about the usefulness of these additional info fields. But I would like to object to the way you've gone about it, namely to copy-and-paste the magic-block mechanism. That doesn't scale: the next time somebody else wants some more fields, will we have three such structs? The approach we foresaw using was that we could simply add more fields to Pg_magic_struct (obviously, only in a major version). That's happened at least once already - abi_extra was not there to begin with. There are a couple of ways that we could deal with the API seen by module authors: 1. The PG_MODULE_MAGIC macro keeps the same API and leaves the additional field(s) empty. Authors who want to fill the extra field(s) use a new macro, say PG_MODULE_MAGIC_V2. 2. PG_MODULE_MAGIC gains some arguments, forcing everybody to change their code. While this would be annoying, it'd be within our compatibility rules for a major version update. I wouldn't do it though unless there were a compelling reason why everybody should fill these fields. 3. Maybe we could do something with making PG_MODULE_MAGIC variadic, but I've not thought hard about what that could look like. In any case it'd only be a cosmetic improvement over the above ways. 4. The extra fields are filled indirectly by macros that extension authors can optionally provide (a variant on the FMGR_ABI_EXTRA mechanism). This would be code-order-sensitive so I'm not sure it's really a great idea. 5. Something I didn't think of? With any of these except #4, authors who want their source code to support multiple PG major versions would be forced into using #if tests on CATALOG_VERSION_NO to decide what to write. That's a bit annoying but hardly unusual. regards, tom lane
Hi, On 2024-12-11 13:21:03 -0500, Tom Lane wrote: > Andrei Lepikhov <lepihov@gmail.com> writes: > > I would like to propose the module_info structure, which aims to let > > extension maintainers sew some data into the binary file. Being included > > in the module code, this information remains unchanged and is available > > for reading by a backend. > > I don't have much of an opinion one way or the other about the > usefulness of these additional info fields. FWIW, Id like to have some more information in there, without commenting on the specifics. > But I would like to object to the way you've gone about it, namely to > copy-and-paste the magic-block mechanism. That doesn't scale: the next time > somebody else wants some more fields, will we have three such structs? I agree with that. > The approach we foresaw using was that we could simply add more > fields to Pg_magic_struct (obviously, only in a major version). > That's happened at least once already - abi_extra was not there > to begin with. > > There are a couple of ways that we could deal with the API > seen by module authors: > > 1. The PG_MODULE_MAGIC macro keeps the same API and leaves the > additional field(s) empty. Authors who want to fill the > extra field(s) use a new macro, say PG_MODULE_MAGIC_V2. > > 2. PG_MODULE_MAGIC gains some arguments, forcing everybody > to change their code. While this would be annoying, it'd be > within our compatibility rules for a major version update. > I wouldn't do it though unless there were a compelling reason > why everybody should fill these fields. I'd like to avoid needing to do this again if / when we invent the next set of optional arguments. So just having a different macro with a hardcoded set of arguments or changing PG_MODULE_MAGIC to have a hardcoded set of arguments doesn't seem great. To be future proof, I think it'd be good to declare the arguments as designated initializers. E.g. like PG_MODULE_MAGIC_EXT( .version = 10000, .threadsafe = 1 ); where the macro would turn the arguments into a struct initializer inside Pg_magic_struct. That way we can add/remove arguments and only extensions that use removed arguments need to change. > 3. Maybe we could do something with making PG_MODULE_MAGIC > variadic, but I've not thought hard about what that could > look like. In any case it'd only be a cosmetic improvement > over the above ways. Yea, it'd be nice to avoid needing an _EXT or _V2. But I can't immediately think of a way that allows a macro with no arguments and and an argument. > 4. The extra fields are filled indirectly by macros that > extension authors can optionally provide (a variant on the > FMGR_ABI_EXTRA mechanism). This would be code-order-sensitive > so I'm not sure it's really a great idea. Agreed. > With any of these except #4, authors who want their source code to > support multiple PG major versions would be forced into using #if > tests on CATALOG_VERSION_NO to decide what to write. That's a > bit annoying but hardly unusual. #2 would be bit more annoying than #1, I'd say, because it'd affect every single extension, even ones not interested in any of this. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-12-11 13:21:03 -0500, Tom Lane wrote: >> There are a couple of ways that we could deal with the API >> seen by module authors: > To be future proof, I think it'd be good to declare the arguments as > designated initializers. E.g. like > PG_MODULE_MAGIC_EXT( > .version = 10000, > .threadsafe = 1 > ); Yeah, I'd come to pretty much the same conclusion after sending my email. That looks like it should work and be convenient to extend further. The other possibly-non-obvious bit is that we should probably invent a sub-structure holding the ABI-related fields, so as to minimize the amount of rewriting needed in dfmgr.c. regards, tom lane
On Wed, Dec 11, 2024, at 4:26 PM, Andres Freund wrote:
On 2024-12-11 13:21:03 -0500, Tom Lane wrote:> Andrei Lepikhov <lepihov@gmail.com> writes:> > I would like to propose the module_info structure, which aims to let> > extension maintainers sew some data into the binary file. Being included> > in the module code, this information remains unchanged and is available> > for reading by a backend.>> I don't have much of an opinion one way or the other about the> usefulness of these additional info fields.FWIW, Id like to have some more information in there, without commenting onthe specifics.
+1 for the general idea. I received some reports like [1] related to wal2json
that people wants to obtain the output plugin version. Since it is not installed
via CREATE EXTENSION, it is not possible to detect what version is installed,
hence, some tools cannot have some logic to probe the module version. In a
managed environment, it is hard to figure out the exact version for
non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor.
On 12/12/2024 01:21, Tom Lane wrote: > Andrei Lepikhov <lepihov@gmail.com> writes: >> I would like to propose the module_info structure, which aims to let >> extension maintainers sew some data into the binary file. Being included >> in the module code, this information remains unchanged and is available >> for reading by a backend. > > I don't have much of an opinion one way or the other about the > usefulness of these additional info fields. But I would like to > object to the way you've gone about it, namely to copy-and-paste > the magic-block mechanism. That doesn't scale: the next time > somebody else wants some more fields, will we have three such > structs? It makes sense. But I want to clarify that I avoided changing PG_MODULE_MAGIC because the newly introduced structure has a totally different purpose and usage logic: the struct is designed to check compatibility, but module info isn't connected to the core version at all: a single version of the code may be built for multiple PG versions. At the same time, various versions of the same library may be usable with the same core. From the coding point of view, I agree that your approach is more laconic and reasonable. I will rewrite the code using this approach. -- regards, Andrei Lepikhov
"Euler Taveira" <euler@eulerto.com> writes: > +1 for the general idea. I received some reports like [1] related to wal2json > that people wants to obtain the output plugin version. Since it is not installed > via CREATE EXTENSION, it is not possible to detect what version is installed, > hence, some tools cannot have some logic to probe the module version. In a > managed environment, it is hard to figure out the exact version for > non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor. What would you foresee as the SQL API for inspecting a module that's not tied to an extension? regards, tom lane
Andrei Lepikhov <lepihov@gmail.com> writes: > It makes sense. But I want to clarify that I avoided changing > PG_MODULE_MAGIC because the newly introduced structure has a totally > different purpose and usage logic: the struct is designed to check > compatibility, but module info isn't connected to the core version at > all: a single version of the code may be built for multiple PG versions. > At the same time, various versions of the same library may be usable > with the same core. Surely. But I don't see a need for two separately-looked-up physical structures. Seems to me it's sufficient to put the ABI-checking fields into a sub-struct within the magic block. regards, tom lane
Michael Paquier <michael@paquier.xyz> writes: > On Wed, Dec 11, 2024 at 08:34:28PM -0500, Tom Lane wrote: >> What would you foresee as the SQL API for inspecting a module that's >> not tied to an extension? > Rather than a function that can be called with a specific module name > in input, invent a new system SRF function that would report back for > a process all the libraries that have been loaded in it? Yeah, that could work. > Presumably, > the extra tracking can be done in dfmgr.c with more fields added to > DynamicFileList to track the information involved. I wouldn't add any overhead to the normal case for this. Couldn't we walk the list and re-fetch each module's magic block inside this new function? regards, tom lane
On 12/12/24 10:44, Michael Paquier wrote: > On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote: >> Michael Paquier <michael@paquier.xyz> writes: >>> Presumably, >>> the extra tracking can be done in dfmgr.c with more fields added to >>> DynamicFileList to track the information involved. >> >> I wouldn't add any overhead to the normal case for this. Couldn't >> we walk the list and re-fetch each module's magic block inside >> this new function? > > Depends on how much we should try to cache to make that less expensive > on repeated calls because we cannot unload libraries, but sure, I > don't see why we could not that for each SQL function call to simplify > the logic and the structures in place. I want to say that 'cannot unload libraries' is a negative outcome of the architecture. It would be better to invent something like PG_unregister, allowing libraries to at least return a hook routine call back to the system. So, maybe it makes sense to design this feature with re-fetching libraries, supposing it is already implemented somehow and elements of the DynamicFileList may be removed. -- regards, Andrei Lepikhov
On Thu, Dec 12, 2024 at 3:41 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 12/12/24 08:36, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> It makes sense. But I want to clarify that I avoided changing
>> PG_MODULE_MAGIC because the newly introduced structure has a totally
>> different purpose and usage logic: the struct is designed to check
>> compatibility, but module info isn't connected to the core version at
>> all: a single version of the code may be built for multiple PG versions.
>> At the same time, various versions of the same library may be usable
>> with the same core.
>
> Surely. But I don't see a need for two separately-looked-up
> physical structures. Seems to me it's sufficient to put the
> ABI-checking fields into a sub-struct within the magic block.
Okay, I've rewritten the patch to understand how it works. It seems to
work pretty well. I added separate fields for minor and major versions.
I am keenly interested in helping in this area; as you have mentioned, I've done similar work using an extension.
Some thoughts/questions:
1. Do we need to latch onto the "magic" structure here? Have we considered an opportunity to create a separate metadata slot that looks something like `PG_MODULE_INFO(.version = ...)`. My impression of module magic was that it should rather be populated during the build – to provide build-time information. MODULE_INFO would be a rather informational section supplied by the developer.
2. Any reasons to dictate MAJ.MIN format? With semantic versioning abound, it's rather common to use MAJ.MIN.PATCH. There are also other extensions to it (like pre-releases, builds, etc.). All of these indicate distinct versions. The differences between them can be figured out using semver or other parsers. Pure PL/pgSQL implementations of that exist [1].
3. In my work, I also introduced the concept of stable module identity – a unique string (for example, UUID) that represents the identity of the module even if its name is going to change. Admittedly, this is not _the most common_ type of problem, but I anticipate it becoming more of an issue with the growth of the extension ecosystem, potential name clashes, and renamings. With this approach, developers assign this unique string to a module once at the beginning and never change it. Have you considered this?
Hi, On 2024-12-12 11:35:56 +0700, Andrei Lepikhov wrote: > On 12/12/24 10:44, Michael Paquier wrote: > > On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote: > > > Michael Paquier <michael@paquier.xyz> writes: > > > > Presumably, > > > > the extra tracking can be done in dfmgr.c with more fields added to > > > > DynamicFileList to track the information involved. > > > > > > I wouldn't add any overhead to the normal case for this. Couldn't > > > we walk the list and re-fetch each module's magic block inside > > > this new function? > > > > Depends on how much we should try to cache to make that less expensive > > on repeated calls because we cannot unload libraries, but sure, I > > don't see why we could not that for each SQL function call to simplify > > the logic and the structures in place. > I want to say that 'cannot unload libraries' is a negative outcome of the > architecture. It would be better to invent something like PG_unregister, > allowing libraries to at least return a hook routine call back to the > system. > So, maybe it makes sense to design this feature with re-fetching libraries, > supposing it is already implemented somehow and elements of the > DynamicFileList may be removed. I am quite certain we'll not support unloading libraries anytime soon. We used to support it and it caused problems... Changing anything about how exactly things are tracked in dfmgr.c will be the smallest part of supporting unloading libraries again. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2024-12-12 11:35:56 +0700, Andrei Lepikhov wrote: >> I want to say that 'cannot unload libraries' is a negative outcome of the >> architecture. It would be better to invent something like PG_unregister, >> allowing libraries to at least return a hook routine call back to the >> system. > I am quite certain we'll not support unloading libraries anytime soon. We used > to support it and it caused problems... Changing anything about how exactly > things are tracked in dfmgr.c will be the smallest part of supporting > unloading libraries again. Indeed. However, I don't see what that has to do with the current discussion anyway. The proposed SRF would iterate through whatever is in the DynamicFileList. What does it care whether there's a way to add or remove entries? regards, tom lane
On 12/12/24 21:02, Yurii Rashkovskii wrote: > On Thu, Dec 12, 2024 at 3:41 PM Andrei Lepikhov <lepihov@gmail.com > <mailto:lepihov@gmail.com>> wrote: > > On 12/12/24 08:36, Tom Lane wrote: > > Andrei Lepikhov <lepihov@gmail.com <mailto:lepihov@gmail.com>> > writes: > >> It makes sense. But I want to clarify that I avoided changing > >> PG_MODULE_MAGIC because the newly introduced structure has a totally > >> different purpose and usage logic: the struct is designed to check > >> compatibility, but module info isn't connected to the core > version at > >> all: a single version of the code may be built for multiple PG > versions. > >> At the same time, various versions of the same library may be usable > >> with the same core. > > > > Surely. But I don't see a need for two separately-looked-up > > physical structures. Seems to me it's sufficient to put the > > ABI-checking fields into a sub-struct within the magic block. > Okay, I've rewritten the patch to understand how it works. It seems to > work pretty well. I added separate fields for minor and major versions. > > > I am keenly interested in helping in this area; as you have mentioned, > I've done similar work using an extension. > > Some thoughts/questions: > > 1. Do we need to latch onto the "magic" structure here? Have we > considered an opportunity to create a separate metadata slot that looks > something like `PG_MODULE_INFO(.version = ...)`. My impression of module > magic was that it should rather be populated during the build – to > provide build-time information. MODULE_INFO would be a rather > informational section supplied by the developer. It has already been debated above. I may agree with colleagues that maintainer-provided information should be stored in the magic field to reduce noise. At the same time, we use a single code part to load all that data into the DynamicFileList. That looks pretty well, isn't it? > > 2. Any reasons to dictate MAJ.MIN format? With semantic versioning > abound, it's rather common to use MAJ.MIN.PATCH. There are also other > extensions to it (like pre-releases, builds, etc.). All of these > indicate distinct versions. The differences between them can be figured > out using semver or other parsers. Pure PL/pgSQL implementations of that > exist [1]. Okay, thanks; that's a good catch. I wonder how to follow these rules with a static fixed-sized structure. I would like to read about any suggestions and implementation examples. > > 3. In my work, I also introduced the concept of stable module identity – > a unique string (for example, UUID) that represents the identity of the > module even if its name is going to change. Admittedly, this is not _the > most common_ type of problem, but I anticipate it becoming more of an > issue with the growth of the extension ecosystem, potential name > clashes, and renamings. With this approach, developers assign this > unique string to a module once at the beginning and never change it. > Have you considered this? This option just needs some live examples. I think, if necessary, it could be added later. -- regards, Andrei Lepikhov
Andrei Lepikhov <lepihov@gmail.com> writes: > On 12/12/24 21:02, Yurii Rashkovskii wrote: >> 2. Any reasons to dictate MAJ.MIN format? With semantic versioning >> abound, it's rather common to use MAJ.MIN.PATCH. > Okay, thanks; that's a good catch. I wonder how to follow these rules > with a static fixed-sized structure. I would like to read about any > suggestions and implementation examples. There's nothing stopping a field of the magic block from being a "const char *" pointer to a string literal. regards, tom lane
On Dec 11, 2024, at 19:49, Euler Taveira <euler@eulerto.com> wrote: >> FWIW, Id like to have some more information in there, without commenting on >> the specifics. > > +1 for the general idea. Same. > I received some reports like [1] related to wal2json > that people wants to obtain the output plugin version. Since it is not installed > via CREATE EXTENSION, it is not possible to detect what version is installed, > hence, some tools cannot have some logic to probe the module version. I’m all for additional metadata for native extensions, but I’d also like to draw attention to the “Future” section my proposal[1]to require that module-only extensions also include a control file and be loadable via CREATE EXTENSION (and proposed*_preload_extensions GUCs[2]). This would unify how all types of extensions are added to a database, and would includeversion information as for all other CREATE EXTENSION extensions. Not a mutually-exclusive proposal, of course; I think it makes sense to have metadata included in the binary itself. Wouldbe useful to compare against what CREATE EXTENSION thinks is the version and raising an error or warning when they diverge. Best, David [1]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#future-deprecate-load [2]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#extension-preloading
"David E. Wheeler" <david@justatheory.com> writes: > Not a mutually-exclusive proposal, of course; I think it makes sense to have metadata included in the binary itself. Wouldbe useful to compare against what CREATE EXTENSION thinks is the version and raising an error or warning when they diverge. How would that work for extensions where the C code is intentionally supporting multiple versions of the SQL objects? regards, tom lane
On Dec 23, 2024, at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > How would that work for extensions where the C code is intentionally > supporting multiple versions of the SQL objects? I guess some people do that, eh? In that case it wouldn’t. D
On 12/24/24 02:23, David E. Wheeler wrote: > On Dec 11, 2024, at 19:49, Euler Taveira <euler@eulerto.com> wrote: > >>> FWIW, Id like to have some more information in there, without commenting on >>> the specifics. >> >> +1 for the general idea. > > Same. > >> I received some reports like [1] related to wal2json >> that people wants to obtain the output plugin version. Since it is not installed >> via CREATE EXTENSION, it is not possible to detect what version is installed, >> hence, some tools cannot have some logic to probe the module version. > > I’m all for additional metadata for native extensions, but I’d also like to draw attention to the “Future” section my proposal[1]to require that module-only extensions also include a control file and be loadable via CREATE EXTENSION (and proposed*_preload_extensions GUCs[2]). This would unify how all types of extensions are added to a database, and would includeversion information as for all other CREATE EXTENSION extensions. Looking into the control file, I see that most parameters are unnecessary for the library. Why do we have to maintain this file? In my experience, extra features are usually designed as shared libraries to 1) reduce complexity, 2) work across the overall cluster, 3) be dynamically loaded, 4) be hidden, and not waste the database with any type of object. - remember, applications sometimes manage their data through an API; databases and any objects inside may be created/moved automatically, and we want to work in any database. The 'CREATE EXTENSION' statement would have made sense if we had register/unregister hook machinery. Without that, it seems it is just about maintaining the library's version and comments locally in a specific database. It would be interesting to read about your real-life cases that caused your proposal. -- regards, Andrei Lepikhov
On Mon, Dec 16, 2024 at 12:02 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 12/13/24 10:17, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> On 12/12/24 21:02, Yurii Rashkovskii wrote:
>>> 2. Any reasons to dictate MAJ.MIN format? With semantic versioning
>>> abound, it's rather common to use MAJ.MIN.PATCH.
>
>> Okay, thanks; that's a good catch. I wonder how to follow these rules
>> with a static fixed-sized structure. I would like to read about any
>> suggestions and implementation examples.
>
> There's nothing stopping a field of the magic block from being
> a "const char *" pointer to a string literal.
Ok, See v.2 in attachment.
I've reviewed the patch, and it is great that you support more flexible versioning now. I am just wondering a bit about the case where `minfo->name` can be `NULL` but `minfo->version` isn't, or where both are `NULL` – should we skip any of these?
On 12/23/24 17:26, David E. Wheeler wrote: > On Dec 23, 2024, at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> How would that work for extensions where the C code is intentionally >> supporting multiple versions of the SQL objects? > > I guess some people do that, eh? In that case it wouldn’t. A function pointer rather than a version constant? Or a function pointer, to be used if the version constant is null? Regards, -Chap