Thread: Re: Add Postgres module info

Re: Add Postgres module info

From
Tom Lane
Date:
Andrei Lepikhov <lepihov@gmail.com> writes:
> I would like to propose the module_info structure, which aims to let 
> extension maintainers sew some data into the binary file. Being included 
> in the module code, this information remains unchanged and is available 
> for reading by a backend.

I don't have much of an opinion one way or the other about the
usefulness of these additional info fields.  But I would like to
object to the way you've gone about it, namely to copy-and-paste
the magic-block mechanism.  That doesn't scale: the next time
somebody else wants some more fields, will we have three such
structs?

The approach we foresaw using was that we could simply add more
fields to Pg_magic_struct (obviously, only in a major version).
That's happened at least once already - abi_extra was not there
to begin with.

There are a couple of ways that we could deal with the API
seen by module authors:

1. The PG_MODULE_MAGIC macro keeps the same API and leaves the
additional field(s) empty.  Authors who want to fill the
extra field(s) use a new macro, say PG_MODULE_MAGIC_V2.

2. PG_MODULE_MAGIC gains some arguments, forcing everybody
to change their code.  While this would be annoying, it'd be
within our compatibility rules for a major version update.
I wouldn't do it though unless there were a compelling reason
why everybody should fill these fields.

3. Maybe we could do something with making PG_MODULE_MAGIC
variadic, but I've not thought hard about what that could
look like.  In any case it'd only be a cosmetic improvement
over the above ways.

4. The extra fields are filled indirectly by macros that
extension authors can optionally provide (a variant on the
FMGR_ABI_EXTRA mechanism).  This would be code-order-sensitive
so I'm not sure it's really a great idea.

5. Something I didn't think of?

With any of these except #4, authors who want their source code to
support multiple PG major versions would be forced into using #if
tests on CATALOG_VERSION_NO to decide what to write.  That's a
bit annoying but hardly unusual.

            regards, tom lane



Re: Add Postgres module info

From
Andres Freund
Date:
Hi,


On 2024-12-11 13:21:03 -0500, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
> > I would like to propose the module_info structure, which aims to let
> > extension maintainers sew some data into the binary file. Being included
> > in the module code, this information remains unchanged and is available
> > for reading by a backend.
>
> I don't have much of an opinion one way or the other about the
> usefulness of these additional info fields.

FWIW, Id like to have some more information in there, without commenting on
the specifics.


> But I would like to object to the way you've gone about it, namely to
> copy-and-paste the magic-block mechanism.  That doesn't scale: the next time
> somebody else wants some more fields, will we have three such structs?

I agree with that.


> The approach we foresaw using was that we could simply add more
> fields to Pg_magic_struct (obviously, only in a major version).
> That's happened at least once already - abi_extra was not there
> to begin with.
>
> There are a couple of ways that we could deal with the API
> seen by module authors:
>
> 1. The PG_MODULE_MAGIC macro keeps the same API and leaves the
> additional field(s) empty.  Authors who want to fill the
> extra field(s) use a new macro, say PG_MODULE_MAGIC_V2.
>
> 2. PG_MODULE_MAGIC gains some arguments, forcing everybody
> to change their code.  While this would be annoying, it'd be
> within our compatibility rules for a major version update.
> I wouldn't do it though unless there were a compelling reason
> why everybody should fill these fields.

I'd like to avoid needing to do this again if / when we invent the next set of
optional arguments. So just having a different macro with a hardcoded set of
arguments or changing PG_MODULE_MAGIC to have a hardcoded set of arguments
doesn't seem great.

To be future proof, I think it'd be good to declare the arguments as
designated initializers. E.g. like

PG_MODULE_MAGIC_EXT(
  .version = 10000,
  .threadsafe = 1
);

where the macro would turn the arguments into a struct initializer inside
Pg_magic_struct.

That way we can add/remove arguments and only extensions that use
removed arguments need to change.


> 3. Maybe we could do something with making PG_MODULE_MAGIC
> variadic, but I've not thought hard about what that could
> look like.  In any case it'd only be a cosmetic improvement
> over the above ways.

Yea, it'd be nice to avoid needing an _EXT or _V2. But I can't immediately
think of a way that allows a macro with no arguments and and an argument.


> 4. The extra fields are filled indirectly by macros that
> extension authors can optionally provide (a variant on the
> FMGR_ABI_EXTRA mechanism).  This would be code-order-sensitive
> so I'm not sure it's really a great idea.

Agreed.


> With any of these except #4, authors who want their source code to
> support multiple PG major versions would be forced into using #if
> tests on CATALOG_VERSION_NO to decide what to write.  That's a
> bit annoying but hardly unusual.

#2 would be bit more annoying than #1, I'd say, because it'd affect every
single extension, even ones not interested in any of this.

Greetings,

Andres Freund



Re: Add Postgres module info

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2024-12-11 13:21:03 -0500, Tom Lane wrote:
>> There are a couple of ways that we could deal with the API
>> seen by module authors:

> To be future proof, I think it'd be good to declare the arguments as
> designated initializers. E.g. like

> PG_MODULE_MAGIC_EXT(
>   .version = 10000,
>   .threadsafe = 1
> );

Yeah, I'd come to pretty much the same conclusion after sending
my email.  That looks like it should work and be convenient
to extend further.

The other possibly-non-obvious bit is that we should probably
invent a sub-structure holding the ABI-related fields, so as to
minimize the amount of rewriting needed in dfmgr.c.

            regards, tom lane



Re: Add Postgres module info

From
"Euler Taveira"
Date:
On Wed, Dec 11, 2024, at 4:26 PM, Andres Freund wrote:
On 2024-12-11 13:21:03 -0500, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
> > I would like to propose the module_info structure, which aims to let
> > extension maintainers sew some data into the binary file. Being included
> > in the module code, this information remains unchanged and is available
> > for reading by a backend.
>
> I don't have much of an opinion one way or the other about the
> usefulness of these additional info fields.

FWIW, Id like to have some more information in there, without commenting on
the specifics.

+1 for the general idea. I received some reports like [1] related to wal2json
that people wants to obtain the output plugin version. Since it is not installed
via CREATE EXTENSION, it is not possible to detect what version is installed,
hence, some tools cannot have some logic to probe the module version. In a
managed environment, it is hard to figure out the exact version for
non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor.



--
Euler Taveira

Re: Add Postgres module info

From
Andrei Lepikhov
Date:
On 12/12/2024 01:21, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> I would like to propose the module_info structure, which aims to let
>> extension maintainers sew some data into the binary file. Being included
>> in the module code, this information remains unchanged and is available
>> for reading by a backend.
> 
> I don't have much of an opinion one way or the other about the
> usefulness of these additional info fields.  But I would like to
> object to the way you've gone about it, namely to copy-and-paste
> the magic-block mechanism.  That doesn't scale: the next time
> somebody else wants some more fields, will we have three such
> structs?
It makes sense. But I want to clarify that I avoided changing 
PG_MODULE_MAGIC because the newly introduced structure has a totally 
different purpose and usage logic: the struct is designed to check 
compatibility, but module info isn't connected to the core version at 
all: a single version of the code may be built for multiple PG versions. 
At the same time, various versions of the same library may be usable 
with the same core.

 From the coding point of view, I agree that your approach is more 
laconic and reasonable. I will rewrite the code using this approach.

-- 
regards, Andrei Lepikhov



Re: Add Postgres module info

From
Tom Lane
Date:
"Euler Taveira" <euler@eulerto.com> writes:
> +1 for the general idea. I received some reports like [1] related to wal2json
> that people wants to obtain the output plugin version. Since it is not installed
> via CREATE EXTENSION, it is not possible to detect what version is installed,
> hence, some tools cannot have some logic to probe the module version. In a
> managed environment, it is hard to figure out the exact version for
> non-CREATE-EXTENSION modules, unless it is explicitly informed by the vendor.

What would you foresee as the SQL API for inspecting a module that's
not tied to an extension?

            regards, tom lane



Re: Add Postgres module info

From
Tom Lane
Date:
Andrei Lepikhov <lepihov@gmail.com> writes:
> It makes sense. But I want to clarify that I avoided changing 
> PG_MODULE_MAGIC because the newly introduced structure has a totally 
> different purpose and usage logic: the struct is designed to check 
> compatibility, but module info isn't connected to the core version at 
> all: a single version of the code may be built for multiple PG versions. 
> At the same time, various versions of the same library may be usable 
> with the same core.

Surely.  But I don't see a need for two separately-looked-up
physical structures.  Seems to me it's sufficient to put the
ABI-checking fields into a sub-struct within the magic block.

            regards, tom lane



Re: Add Postgres module info

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Wed, Dec 11, 2024 at 08:34:28PM -0500, Tom Lane wrote:
>> What would you foresee as the SQL API for inspecting a module that's
>> not tied to an extension?

> Rather than a function that can be called with a specific module name
> in input, invent a new system SRF function that would report back for
> a process all the libraries that have been loaded in it?

Yeah, that could work.

> Presumably,
> the extra tracking can be done in dfmgr.c with more fields added to
> DynamicFileList to track the information involved.

I wouldn't add any overhead to the normal case for this.  Couldn't
we walk the list and re-fetch each module's magic block inside
this new function?

            regards, tom lane



Re: Add Postgres module info

From
Andrei Lepikhov
Date:
On 12/12/24 10:44, Michael Paquier wrote:
> On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote:
>> Michael Paquier <michael@paquier.xyz> writes:
>>> Presumably,
>>> the extra tracking can be done in dfmgr.c with more fields added to
>>> DynamicFileList to track the information involved.
>>
>> I wouldn't add any overhead to the normal case for this.  Couldn't
>> we walk the list and re-fetch each module's magic block inside
>> this new function?
> 
> Depends on how much we should try to cache to make that less expensive
> on repeated calls because we cannot unload libraries, but sure, I
> don't see why we could not that for each SQL function call to simplify
> the logic and the structures in place.
I want to say that 'cannot unload libraries' is a negative outcome of 
the architecture. It would be better to invent something like 
PG_unregister, allowing libraries to at least return a hook routine call 
back to the system.
So, maybe it makes sense to design this feature with re-fetching 
libraries, supposing it is already implemented somehow and elements of 
the DynamicFileList may be removed.

-- 
regards, Andrei Lepikhov



Re: Add Postgres module info

From
Yurii Rashkovskii
Date:
On Thu, Dec 12, 2024 at 3:41 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 12/12/24 08:36, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> It makes sense. But I want to clarify that I avoided changing
>> PG_MODULE_MAGIC because the newly introduced structure has a totally
>> different purpose and usage logic: the struct is designed to check
>> compatibility, but module info isn't connected to the core version at
>> all: a single version of the code may be built for multiple PG versions.
>> At the same time, various versions of the same library may be usable
>> with the same core.
>
> Surely.  But I don't see a need for two separately-looked-up
> physical structures.  Seems to me it's sufficient to put the
> ABI-checking fields into a sub-struct within the magic block.
Okay, I've rewritten the patch to understand how it works. It seems to
work pretty well. I added separate fields for minor and major versions.


I am keenly interested in helping in this area; as you have mentioned, I've done similar work using an extension.

Some thoughts/questions:

1. Do we need to latch onto the "magic" structure here? Have we considered an opportunity to create a separate metadata slot that looks something like `PG_MODULE_INFO(.version = ...)`. My impression of module magic was that it should rather be populated during the build – to provide build-time information. MODULE_INFO would be a rather informational section supplied by the developer. 

2. Any reasons to dictate MAJ.MIN format? With semantic versioning abound, it's rather common to use MAJ.MIN.PATCH. There are also other extensions to it (like pre-releases, builds, etc.). All of these indicate distinct versions. The differences between them can be figured out using semver or other parsers. Pure PL/pgSQL implementations of that exist [1].

3. In my work, I also introduced the concept of stable module identity – a unique string (for example, UUID) that represents the identity of the module even if its name is going to change. Admittedly, this is not _the most common_ type of problem, but I anticipate it becoming more of an issue with the growth of the extension ecosystem, potential name clashes, and renamings. With this approach, developers assign this unique string to a module once at the beginning and never change it. Have you considered this?




Re: Add Postgres module info

From
Andres Freund
Date:
Hi,

On 2024-12-12 11:35:56 +0700, Andrei Lepikhov wrote:
> On 12/12/24 10:44, Michael Paquier wrote:
> > On Wed, Dec 11, 2024 at 10:39:38PM -0500, Tom Lane wrote:
> > > Michael Paquier <michael@paquier.xyz> writes:
> > > > Presumably,
> > > > the extra tracking can be done in dfmgr.c with more fields added to
> > > > DynamicFileList to track the information involved.
> > > 
> > > I wouldn't add any overhead to the normal case for this.  Couldn't
> > > we walk the list and re-fetch each module's magic block inside
> > > this new function?
> > 
> > Depends on how much we should try to cache to make that less expensive
> > on repeated calls because we cannot unload libraries, but sure, I
> > don't see why we could not that for each SQL function call to simplify
> > the logic and the structures in place.
> I want to say that 'cannot unload libraries' is a negative outcome of the
> architecture. It would be better to invent something like PG_unregister,
> allowing libraries to at least return a hook routine call back to the
> system.
> So, maybe it makes sense to design this feature with re-fetching libraries,
> supposing it is already implemented somehow and elements of the
> DynamicFileList may be removed.

I am quite certain we'll not support unloading libraries anytime soon. We used
to support it and it caused problems... Changing anything about how exactly
things are tracked in dfmgr.c will be the smallest part of supporting
unloading libraries again.

Greetings,

Andres Freund



Re: Add Postgres module info

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2024-12-12 11:35:56 +0700, Andrei Lepikhov wrote:
>> I want to say that 'cannot unload libraries' is a negative outcome of the
>> architecture. It would be better to invent something like PG_unregister,
>> allowing libraries to at least return a hook routine call back to the
>> system.

> I am quite certain we'll not support unloading libraries anytime soon. We used
> to support it and it caused problems... Changing anything about how exactly
> things are tracked in dfmgr.c will be the smallest part of supporting
> unloading libraries again.

Indeed.  However, I don't see what that has to do with the current
discussion anyway.  The proposed SRF would iterate through whatever
is in the DynamicFileList.  What does it care whether there's a way
to add or remove entries?

            regards, tom lane



Re: Add Postgres module info

From
Andrei Lepikhov
Date:
On 12/12/24 21:02, Yurii Rashkovskii wrote:
> On Thu, Dec 12, 2024 at 3:41 PM Andrei Lepikhov <lepihov@gmail.com 
> <mailto:lepihov@gmail.com>> wrote:
> 
>     On 12/12/24 08:36, Tom Lane wrote:
>      > Andrei Lepikhov <lepihov@gmail.com <mailto:lepihov@gmail.com>>
>     writes:
>      >> It makes sense. But I want to clarify that I avoided changing
>      >> PG_MODULE_MAGIC because the newly introduced structure has a totally
>      >> different purpose and usage logic: the struct is designed to check
>      >> compatibility, but module info isn't connected to the core
>     version at
>      >> all: a single version of the code may be built for multiple PG
>     versions.
>      >> At the same time, various versions of the same library may be usable
>      >> with the same core.
>      >
>      > Surely.  But I don't see a need for two separately-looked-up
>      > physical structures.  Seems to me it's sufficient to put the
>      > ABI-checking fields into a sub-struct within the magic block.
>     Okay, I've rewritten the patch to understand how it works. It seems to
>     work pretty well. I added separate fields for minor and major versions.
> 
> 
> I am keenly interested in helping in this area; as you have mentioned, 
> I've done similar work using an extension.
> 
> Some thoughts/questions:
> 
> 1. Do we need to latch onto the "magic" structure here? Have we 
> considered an opportunity to create a separate metadata slot that looks 
> something like `PG_MODULE_INFO(.version = ...)`. My impression of module 
> magic was that it should rather be populated during the build – to 
> provide build-time information. MODULE_INFO would be a rather 
> informational section supplied by the developer.
It has already been debated above. I may agree with colleagues that 
maintainer-provided information should be stored in the magic field to 
reduce noise.
At the same time, we use a single code part to load all that data into 
the DynamicFileList. That looks pretty well, isn't it?
> 
> 2. Any reasons to dictate MAJ.MIN format? With semantic versioning 
> abound, it's rather common to use MAJ.MIN.PATCH. There are also other 
> extensions to it (like pre-releases, builds, etc.). All of these 
> indicate distinct versions. The differences between them can be figured 
> out using semver or other parsers. Pure PL/pgSQL implementations of that 
> exist [1].
Okay, thanks; that's a good catch. I wonder how to follow these rules 
with a static fixed-sized structure. I would like to read about any 
suggestions and implementation examples.
> 
> 3. In my work, I also introduced the concept of stable module identity – 
> a unique string (for example, UUID) that represents the identity of the 
> module even if its name is going to change. Admittedly, this is not _the 
> most common_ type of problem, but I anticipate it becoming more of an 
> issue with the growth of the extension ecosystem, potential name 
> clashes, and renamings. With this approach, developers assign this 
> unique string to a module once at the beginning and never change it. 
> Have you considered this?
This option just needs some live examples. I think, if necessary, it 
could be added later.

-- 
regards, Andrei Lepikhov



Re: Add Postgres module info

From
Tom Lane
Date:
Andrei Lepikhov <lepihov@gmail.com> writes:
> On 12/12/24 21:02, Yurii Rashkovskii wrote:
>> 2. Any reasons to dictate MAJ.MIN format? With semantic versioning 
>> abound, it's rather common to use MAJ.MIN.PATCH.

> Okay, thanks; that's a good catch. I wonder how to follow these rules 
> with a static fixed-sized structure. I would like to read about any 
> suggestions and implementation examples.

There's nothing stopping a field of the magic block from being
a "const char *" pointer to a string literal.

            regards, tom lane



Re: Add Postgres module info

From
"David E. Wheeler"
Date:
On Dec 11, 2024, at 19:49, Euler Taveira <euler@eulerto.com> wrote:

>> FWIW, Id like to have some more information in there, without commenting on
>> the specifics.
>
> +1 for the general idea.

Same.

> I received some reports like [1] related to wal2json
> that people wants to obtain the output plugin version. Since it is not installed
> via CREATE EXTENSION, it is not possible to detect what version is installed,
> hence, some tools cannot have some logic to probe the module version.

I’m all for additional metadata for native extensions, but I’d also like to draw attention to the “Future” section my
proposal[1]to require that module-only extensions also include a control file and be loadable via CREATE EXTENSION (and
proposed*_preload_extensions GUCs[2]). This would unify how all types of extensions are added to a database, and would
includeversion information as for all other CREATE EXTENSION extensions. 

Not a mutually-exclusive proposal, of course; I think it makes sense to have metadata included in the binary itself.
Wouldbe useful to compare against what CREATE EXTENSION thinks is the version and raising an error or warning when they
diverge.

Best,

David

[1]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#future-deprecate-load
[2]: https://justatheory.com/2024/11/rfc-extension-packaging-lookup/#extension-preloading


Re: Add Postgres module info

From
Tom Lane
Date:
"David E. Wheeler" <david@justatheory.com> writes:
> Not a mutually-exclusive proposal, of course; I think it makes sense to have metadata included in the binary itself.
Wouldbe useful to compare against what CREATE EXTENSION thinks is the version and raising an error or warning when they
diverge.

How would that work for extensions where the C code is intentionally
supporting multiple versions of the SQL objects?

            regards, tom lane



Re: Add Postgres module info

From
"David E. Wheeler"
Date:
On Dec 23, 2024, at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> How would that work for extensions where the C code is intentionally
> supporting multiple versions of the SQL objects?

I guess some people do that, eh? In that case it wouldn’t.

D




Re: Add Postgres module info

From
Andrei Lepikhov
Date:
On 12/24/24 02:23, David E. Wheeler wrote:
> On Dec 11, 2024, at 19:49, Euler Taveira <euler@eulerto.com> wrote:
> 
>>> FWIW, Id like to have some more information in there, without commenting on
>>> the specifics.
>>
>> +1 for the general idea.
> 
> Same.
> 
>> I received some reports like [1] related to wal2json
>> that people wants to obtain the output plugin version. Since it is not installed
>> via CREATE EXTENSION, it is not possible to detect what version is installed,
>> hence, some tools cannot have some logic to probe the module version.
> 
> I’m all for additional metadata for native extensions, but I’d also like to draw attention to the “Future” section my
proposal[1]to require that module-only extensions also include a control file and be loadable via CREATE EXTENSION (and
proposed*_preload_extensions GUCs[2]). This would unify how all types of extensions are added to a database, and would
includeversion information as for all other CREATE EXTENSION extensions.
 
Looking into the control file, I see that most parameters are 
unnecessary for the library. Why do we have to maintain this file?
In my experience, extra features are usually designed as shared 
libraries to 1) reduce complexity, 2) work across the overall cluster, 
3) be dynamically loaded, 4) be hidden, and not waste the database with 
any type of object. - remember, applications sometimes manage their data 
through an API; databases and any objects inside may be created/moved 
automatically, and we want to work in any database.
The 'CREATE EXTENSION' statement would have made sense if we had 
register/unregister hook machinery. Without that, it seems it is just 
about maintaining the library's version and comments locally in a 
specific database.
It would be interesting to read about your real-life cases that caused 
your proposal.

-- 
regards, Andrei Lepikhov



Re: Add Postgres module info

From
Yurii Rashkovskii
Date:
On Mon, Dec 16, 2024 at 12:02 PM Andrei Lepikhov <lepihov@gmail.com> wrote:
On 12/13/24 10:17, Tom Lane wrote:
> Andrei Lepikhov <lepihov@gmail.com> writes:
>> On 12/12/24 21:02, Yurii Rashkovskii wrote:
>>> 2. Any reasons to dictate MAJ.MIN format? With semantic versioning
>>> abound, it's rather common to use MAJ.MIN.PATCH.
>
>> Okay, thanks; that's a good catch. I wonder how to follow these rules
>> with a static fixed-sized structure. I would like to read about any
>> suggestions and implementation examples.
>
> There's nothing stopping a field of the magic block from being
> a "const char *" pointer to a string literal.
Ok, See v.2 in attachment.

I've reviewed the patch, and it is great that you support more flexible versioning now. I am just wondering a bit about the case where `minfo->name` can be `NULL` but `minfo->version` isn't, or where both are `NULL` – should we skip any of these?


Re: Add Postgres module info

From
Chapman Flack
Date:
On 12/23/24 17:26, David E. Wheeler wrote:
> On Dec 23, 2024, at 15:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How would that work for extensions where the C code is intentionally
>> supporting multiple versions of the SQL objects?
>
> I guess some people do that, eh? In that case it wouldn’t.

A function pointer rather than a version constant?

Or a function pointer, to be used if the version constant is null?

Regards,
-Chap