Thread: RFC: Extension Packaging & Lookup

RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

10 October 2024, 23:34:08

Hackers,

Back at the end of August, I promised[1]:

> I’ll try to put some thought into a more formal proposal in a new thread next week. Unless your Gabriele beats me to
it😂. 

I guess I should get off my butt and do it. So let’s do this. Here’s what I propose.

*   When an extension is installed, all of its files should live in a single directory. These include:

    *   The Control file in directory describes extension
    *   Subdirectories for SQL, shared libraries, docs, binaries
        (also locales and tsearch dictionaries?)

*   Next, there should be an extension lookup path. The first item in the path is the compile-time default, and ideally
wouldinclude only core extensions. Subsequent paths would be set by a GUC, similar to dynamic_library_path, but only
forextensions (including their shared libraries). 

*   Modify PGXS (or create a new installer CLI used by PGXS?) to install an extension according to this pattern. Allow
thespecification of a prefix. This should differ from the current `PREFIX`, in that the values of `sharedir`,
`pkglibdir`,etc. would not be fully-duplicated under the prefix, but point to a directory used in the extension path.
Forexample, when installing an extension need “pair", something like 

        make install BASE_DIR=/opt/pg/extension

    Would create `/opt/pg/extension/pair`, rather than `/opt/pg/extension/$(pg_config --sharedir)/extension/pair`.

*   Perhaps there could also be an option to symlink binary files or man pages to keep paths simple.

*   For CREATE EXTENSION, Postgres would look for an extension on the file system by directory name in each of the
extensionpaths instead of control file name. It would then find the control file in that directory and the necessary
SQLand shared library files in the `sql` and `lib` subdirectories of that directory. 

*   Binary-only extensions might also be installed here; the difference is they have no control file. The LOAD command
andshared_preload_libraries would need to know to look here, too. 

The basic idea, then, is three-fold:

1.  This pattern is more like a packaging pattern than CREATE EXTENSION-specific, since it includes other types of
extensions

2.  All the files for a given extension live within a single directory, making it easier to reason about what’s
installedand what’s not. 

3.  These extension packages can live in multiple paths.

Some examples. Core extensions, like citext, would live in, say, $(pg_config --extensiondir)/citext), and have a
structuresuch as: 

```
citext
├── citext.control
├── lib
│   ├── citext.dylib
│   └── bitcode
│   ├── citext
│   │   └── citext.bc
│   └── citext.index.bc
└── sql
    ├── citext--1.0--1.1.sql
    ├── citext--1.1--1.2.sql
    ├── citext--1.2--1.3.sql
    ├── citext--1.3--1.4.sql
    ├── citext--1.4--1.5.sql
    ├── citext--1.4.sql
    └── citext--1.5--1.6.sql
```

Third-party extensions would live in one or more other directories on the file system, unknown at compile time, but set
inthe extension path GUC and accessible to/owned by the Postgres system user. Let’s say we set `/opt/pgxn` as one of
thepaths. Within that directory, we might have a directory for a pure SQL extension in a a directory named “pair” that
lookslike this: 

```
pair
├── LICENSE.md
├── README.md
├── pair.control
├── doc
│   ├── html
│   │   └── pair.html
│   └── pair.md
└── sql
├── pair--1.0--1.1.sql
└── pair--1.1.sql
```

A binary application like pg_top would live in the pg_top directory, structured something like:

```
pg_top
├── HISTORY.rst
├── INSTALL.rst
├── LICENSE
├── README.rst
├── bin
|   └── pg_top
└── doc
    └── man
        └── man3
            └── pg_top.3
```

And a C extension like semver would live in the semver directory and be structured something like:

```
semver
├── LICENSE
├── README.md
├── semver.control
├── doc
│   └── semver.md
├── lib
│   ├── semver.dylib
│   ├── bitcode
│   └── semver
│   │   └── semver.bc
│   └── semver.index.bc
└── sql
    ├── semver--1.0--1.1.sql
    └── semver--1.1.sql
```

Thoughts?

Best,

David

[1]: https://www.postgresql.org/message-id/D30A91FA-A6D4-4737-941F-0BBB2984B730%40justatheory.com

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 01:19:43

Greetings Postgres humans,

There was much discussion of this proposal at PGConf.eu <http://pgconf.eu/> last week, between Gabriele Bartolini,
PeterEisentraut, Christoph Berg, and Andres Freund (all Cc’d here), and me, among others. We agreed, in principle, to
anapproach to this feature. Overall I think the proposal doesn’t need to change, but there are a couple of things to
tweak,and I’ve added a list of use cases I’m aware of below, plus a tangent on the challenges of loading system DOS. 

Quoting a lot and responding inline.

On Oct 10, 2024, at 4:34 PM, David E. Wheeler <david@justatheory.com> wrote:

> I guess I should get off my butt and do it. So let’s do this. Here’s what I propose.
>
> *   When an extension is installed, all of its files should live in a single directory. These include:
>
>    *   The Control file in directory describes extension
>    *   Subdirectories for SQL, shared libraries, docs, binaries
>        (also locales and tsearch dictionaries?)

Just to be clear, these directories correspond to the `pg_config `--*dir` options, excluding include directories:

```
❯ pg_config --help | grep 'dir\b' | grep -v include
  --bindir              show location of user executables
  --docdir              show location of documentation files
  --htmldir             show location of HTML documentation files
  --libdir              show location of object code libraries
  --pkglibdir           show location of dynamically loadable modules
  --localedir           show location of locale support files
  --mandir              show location of manual pages
  --sharedir            show location of architecture-independent support files
  --sysconfdir          show location of system-wide configuration files

```

But perhaps also excluding --sysconfdir?

> *   Next, there should be an extension lookup path. The first item in the path is the compile-time default, and
ideallywould include only core extensions. Subsequent paths would be set by a GUC, similar to dynamic_library_path, but
onlyfor extensions (including their shared libraries). 

Let’s call it extension_path.

I also suggest adding two new pg_config options, for the directory containing core extensions, and a second for system
oruser extensions. Something like: 

  --extension-dir      show location of core extensions
  --extension-dir-user show location of user extensions

The default value for the `extension_path` GUC would be, assuming some new template variables:

    extension_path = '$userextdir,$extdir'

This will allow installers (PGXS) to know where to install non-core extensions without bothering the user about it.

> *   Modify PGXS (or create a new installer CLI used by PGXS?) to install an extension according to this pattern.
Allowthe specification of a prefix. This should differ from the current `PREFIX`, in that the values of `sharedir`,
`pkglibdir`,etc. would not be fully-duplicated under the prefix, but point to a directory used in the extension path.
Forexample, when installing an extension need “pair", something like 
>
>        make install BASE_DIR=/opt/pg/extension
>
>    Would create `/opt/pg/extension/pair`, rather than `/opt/pg/extension/$(pg_config --sharedir)/extension/pair`.
>
> *   Perhaps there could also be an option to symlink binary files or man pages to keep paths simple.
>
> *   For CREATE EXTENSION, Postgres would look for an extension on the file system by directory name in each of the
extensionpaths instead of control file name. It would then find the control file in that directory and the necessary
SQLand shared library files in the `sql` and `lib` subdirectories of that directory. 

In discussion, I think we clarified that it should look for $extension/$extension.control.

> *   Binary-only extensions might also be installed here; the difference is they have no control file. The LOAD
commandand shared_preload_libraries would need to know to look here, too. 

Or perhaps we should require a control file for these, too, but add a “type” key or some such? Maybe such a shared
modulecould be supported by CREATE EXTENSION, as well as, but not include SQL files? 

> The basic idea, then, is three-fold:
>
> 1.  This pattern is more like a packaging pattern than CREATE EXTENSION-specific, since it includes other types of
extensions
>
> 2.  All the files for a given extension live within a single directory, making it easier to reason about what’s
installedand what’s not. 
>
> 3.  These extension packages can live in multiple paths.

For dupes, the first one found in the list of extension_path directories is the one that Postgres will load.

We also discussed including the version in the directory name, so that multiple versions could be installed at once.
Notsure how Postgres would pick the right one, though. 

> Some examples. Core extensions, like citext, would live in, say, $(pg_config --extensiondir)/citext), and have a
structuresuch as: 
>
> ```
> citext
> ├── citext.control
> ├── lib
> │   ├── citext.dylib
> │   └── bitcode
> │   ├── citext
> │   │   └── citext.bc
> │   └── citext.index.bc
> └── sql
>    ├── citext--1.0--1.1.sql
>    ├── citext--1.1--1.2.sql
>    ├── citext--1.2--1.3.sql
>    ├── citext--1.3--1.4.sql
>    ├── citext--1.4--1.5.sql
>    ├── citext--1.4.sql
>    └── citext--1.5--1.6.sql
> ```
>
> Third-party extensions would live in one or more other directories on the file system, unknown at compile time, but
setin the extension path GUC and accessible to/owned by the Postgres system user. Let’s say we set `/opt/pgxn` as one
ofthe paths. Within that directory, we might have a directory for a pure SQL extension in a a directory named “pair”
thatlooks like this: 
>
> ```
> pair
> ├── LICENSE.md
> ├── README.md
> ├── pair.control
> ├── doc
> │   ├── html
> │   │   └── pair.html
> │   └── pair.md
> └── sql
> ├── pair--1.0--1.1.sql
> └── pair--1.1.sql
> ```
>
> A binary application like pg_top would live in the pg_top directory, structured something like:
>
> ```
> pg_top
> ├── HISTORY.rst
> ├── INSTALL.rst
> ├── LICENSE
> ├── README.rst
> ├── bin
> |   └── pg_top
> └── doc
>    └── man
>        └── man3
>            └── pg_top.3
> ```
>
> And a C extension like semver would live in the semver directory and be structured something like:
>
> ```
> semver
> ├── LICENSE
> ├── README.md
> ├── semver.control
> ├── doc
> │   └── semver.md
> ├── lib
> │   ├── semver.dylib
> │   ├── bitcode
> │   └── semver
> │   │   └── semver.bc
> │   └── semver.index.bc
> └── sql
>    ├── semver--1.0--1.1.sql
>    └── semver--1.1.sql
> ```

Another example: a binary-only extension loaded via LOAD (or *_preload_libraries), and not `CREATE EXTENSION`, like
auto_explain:

```
auto_explain
├── auto_explain.control
└── lib
    ├── auto_explain.dylib
    ├── bitcode
    └── auto_explain
    │   └── auto_explain.bc
    └── auto_explain.index.bc
```

I’ve included the control file, as suggested above, as a way to manage *all* extensions, not just `CREATE EXTENSION`
extensions.A non-core extension would be the same, but might include other files like a README, LICENSE, etc. 

One wrinkle: Some extensions, such as pg_hint_plan[2], include both a `CREATE EXTENSION` extension and a `LOAD` module,
andboth have the same name. Not sure how best to adapt for such a case to the proposal to include a control file for
bothtypes of extension --- because both would have the same control file name. My proposal is that names would be
uniquebetween both `CREATE EXTENSION` and `LOAD` extensions, in which case one or the other extension would need to be
renamed(probably the `LOAD` extension). Then perhaps the `CREATE EXTENSION` extension could declare the `LOAD`
extensionas a dependency. 

## Use Cases

Here’s how the proposed file layout and extension_path feature would work for the use cases that have driven it.

### Apt/Yum testing

Rather than patching Postgres to look up pg_config directories under a prefix[3], a packager who wants to run tests and
thereforeneeds to install an extension where Postgres can find it without writing to the installed server would follow
thesesteps: 

*   Set the extension_path GUC to search the package DESTDIR.
*   Install the extension into that directory: `make install BASE_DIR=$DESTDIR`
*   Run `make installcheck`

This should allow Postgres to find and load the extension during the tests. The Postgres installation will not have
beenmodified, only the extension_path will have been changed. 

### Postgres.app

The contents of the macOS Postgres.app bundle must be immutable in order to validate against the signature generated by
anApple-provided certificate. In order to allow extensions to be installed without changing the app bundle, the app
wouldeither: 

1.  Ship with an extension_path pointing to a directory outside the bundle, and then users have to know what this
directoryis when `make install`ing an extension; or 

2.  Ship with the --extension-dir-user pg_config value described above pointing to a directory outside the bundle, into
whichall non-core extensions would be `make install`ed into. 

### Docker/Kubernetes

Like Postgres.app, Docker images are immutable, but unlike Postgres.app, they represent the entire system. The solution
isidentical to that for Postgres.app, except that instead of installing extensions into --extension-dir-user, they
wouldbe mounted as volumes in that directory. 

So, instead of `make install`ing there, a Kubernetes pod can be configured to mount a volume for each extension. Need
toadd a new extension to a Pod? Just mount a volume for it that contains the necessary files, as described in the
examplesabove. 

One wrinkle: Some Kubernetes providers limit the number of volumes that can be mounted[4]. If someone needed more
volumesthan that, one would need to adopt a different pattern. Perhaps there could be one volume for the
extension-dir-user,and an external service could add any and all necessary extensions to it? 

## Challenge: Third Party Dependencies

Some extensions require third-party dependencies, usually provided by the OS. For example, pgsql-http[5] compiles into
aDSO that dynamically requires another DSO, libcurl. Finding and loading of such dependencies is handled by the OS, not
byPostgres. This configuration is an annoyance on most systems, where the user has to figure out what dependencies to
installfrom their OS package manager in order to get it to work. 

However, it presents a greater challenge for immutable conditions, as in Docker and Kubernetes containers. How can
systemdependencies be added without breaking immutability? There are a few options: 

1.  Just include all likely dependencies in the base image. This works today, but it would be preferable not to include
additionaldependencies that may never be used. It also can unnecessarily bloat an image. Also just kinda gross. 

2.  Mount a volume with all of the default base image DSOs, then have an external process add DSOs to it when required
fora new extension. This also might work today, although it requires coding that external process (e.g., a Kubernetes
operator).

3.  Mount a second volume for non-base image DSOs, and again have an external process put them there when needed, but
thenuse some method to tell the OS where to find them, since they won’t be in the default location. More on that below. 

4.  Mount individual DSO files as volumes[6] as needed in the system shared lib directory where the OS can find them.
Thewrinkle here is the mount limitation imposed by some providers, detailed above. 

For solutions that require installing a DSO outside the default directories that the system is aware of, one needs a
wayto tell the system where to find them. There are two basic methods for doing so: 

1.  Set LD_LIBRARY_PATH to the directory in which third-party DSOs are installed. Today, however, this is considered an
insecurepattern[7]. It doesn’t work at all on macOS, for example, unless you disable SIP[8]. Few will do so, nor should
they.Andres Freund reports that it’s on its way out on Linux, too. So perhaps some can do this, but it sounds as if
LD_LIBRARY_PATH’sdays are numbered. 

2.  Use `-rpath` when compiling the DSOs to point to the proper place. This embeds the path in the DSO, so it always
looksin the same place. However, this requires that the DSO be recompiled for every variant of the `-rpath`, which
createschallenges for non-path specific binary packaging --- or if an OS vendor changes the director. But perhaps
Postgresitself could be compiled with an `-rpath` that will be used when loading extensions, so the extension DSOs
themselvesdon’t have to know the path? Then the base image just needs to be compiled with that option. 

## Comments and Corrections

I think I captured most of the issues we discussed at PGConf.eu <http://pgconf.eu/> last week; please correct any
misunderstandings,inaccuracies, and oversights you spot! And are there any other issues you can think of with the
overallapproach? 

Thanks for reading to the end!

Best,

David

[1]: https://www.postgresql.org/message-id/D30A91FA-A6D4-4737-941F-0BBB2984B730%40justatheory.com
[2]: https://github.com/ossc-db/pg_hint_plan/
[3]: https://commitfest.postgresql.org/50/4913/
[4]: https://superuser.com/a/1603150/285886
[5]: https://github.com/pramsey/pgsql-http
[6]: https://stackoverflow.com/a/42260979/79202
[7]: http://xahlee.info/UnixResource_dir/_/ldpath.html
[8]: https://developer.apple.com/documentation/security/disabling-and-enabling-system-integrity-protection

Re: RFC: Extension Packaging & Lookup

From

Paul Ramsey

Date:

29 October 2024, 19:23:21

Thanks for this, David,

> On Oct 28, 2024, at 3:19 PM, David E. Wheeler <david@justatheory.com> wrote:
>
> ## Challenge: Third Party Dependencies

This of course is the area that worries the heck out of me, as someone with extensions that includes not just single
systemdependencies but long chains of them (depending on GDAL draws in a huge tree). 
>
> For solutions that require installing a DSO outside the default directories that the system is aware of, one needs a
wayto tell the system where to find them. There are two basic methods for doing so: 
>
> 1.  Set LD_LIBRARY_PATH to the directory in which third-party DSOs are installed. Today, however, this is considered
aninsecure pattern[7]. It doesn’t work at all on macOS, for example, unless you disable SIP[8]. Few will do so, nor
shouldthey. Andres Freund reports that it’s on its way out on Linux, too. So perhaps some can do this, but it sounds as
ifLD_LIBRARY_PATH’s days are numbered. 
>
> 2.  Use `-rpath` when compiling the DSOs to point to the proper place. This embeds the path in the DSO, so it always
looksin the same place. However, this requires that the DSO be recompiled for every variant of the `-rpath`, which
createschallenges for non-path specific binary packaging --- or if an OS vendor changes the director. But perhaps
Postgresitself could be compiled with an `-rpath` that will be used when loading extensions, so the extension DSOs
themselvesdon’t have to know the path? Then the base image just needs to be compiled with that option. 

I’m unsure if it will work, but I have wondered if building out the dependencies to install right next to the DSO, and
givingthe DSO an rpath of “.” would achieve the effect we are looking for. It’s unfortunate (DY)LD_LIBRARY_PATH is dead
anddying, but there we are. The trouble I see with somehow coercing the system to load a local copy of system libraries
isfor (a) common system libs that PostgreSQL itself might be linking (libssl, for example) that then will end up with
symbolcollisions between the copy loaded by postgres and the copy loaded by the DSO and (b) same thing but for
differentextensions with the same dependencies.  

I guess I cannot shake the idea that a lot of interesting extensions are going to have interesting system dependencies,
that“exposing an interesting library to postgres” has a high value for an integration system like PostgreSQL.  

Question for the more knowledgable, how are binary distribution systems like Conda and others shipping DLLs such that
differentpackages don’t clobber each other? 

P.

>
> ## Comments and Corrections
>
> I think I captured most of the issues we discussed at PGConf.eu <http://pgconf.eu/> last week; please correct any
misunderstandings,inaccuracies, and oversights you spot! And are there any other issues you can think of with the
overallapproach? 
>
> Thanks for reading to the end!
>
> Best,
>
> David
>
> [1]: https://www.postgresql.org/message-id/D30A91FA-A6D4-4737-941F-0BBB2984B730%40justatheory.com
> [2]: https://github.com/ossc-db/pg_hint_plan/
> [3]: https://commitfest.postgresql.org/50/4913/
> [4]: https://superuser.com/a/1603150/285886
> [5]: https://github.com/pramsey/pgsql-http
> [6]: https://stackoverflow.com/a/42260979/79202
> [7]: http://xahlee.info/UnixResource_dir/_/ldpath.html
> [8]: https://developer.apple.com/documentation/security/disabling-and-enabling-system-integrity-protection
>
>
>

Re: RFC: Extension Packaging & Lookup

From

Christoph Berg

Date:

29 October 2024, 19:51:28

Re: Paul Ramsey
> Thanks for this, David,
> 
> > On Oct 28, 2024, at 3:19 PM, David E. Wheeler <david@justatheory.com> wrote:
> > 
> > ## Challenge: Third Party Dependencies
> 
> This of course is the area that worries the heck out of me, as someone with extensions that includes not just single
systemdependencies but long chains of them (depending on GDAL draws in a huge tree).

I think this is where the whole idea of "provide binaries outside of
deb/rpm" is just going to die. You are trying to reinvent a wheel that
has been running well for decades, including lots of production
systems. I don't know anyone who would trust that new source of
binaries that doesn't integrate into their OS packaging system.

Christoph

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 20:03:23

On Oct 29, 2024, at 12:51, Christoph Berg <myon@debian.org> wrote:

> I think this is where the whole idea of "provide binaries outside of
> deb/rpm" is just going to die. You are trying to reinvent a wheel that
> has been running well for decades, including lots of production
> systems. I don't know anyone who would trust that new source of
> binaries that doesn't integrate into their OS packaging system.

That’s fine for Linux, but more challenging for macOS and Windows. It’s also an issue that the apt and yum
repositories,while having a lot of stuff, don’t have all extensions. 

D

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 20:09:54

On Oct 29, 2024, at 12:23, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

> Thanks for this, David,

🤘🏻

> This of course is the area that worries the heck out of me, as someone with extensions that includes not just single
systemdependencies but long chains of them (depending on GDAL draws in a huge tree). 

Yeah. I cited pgsql-http as a simple place to start, on the assumption that once we figure out how to properly
configurethings for one DSO, it the pattern should work for any of them in a tree. 

> I’m unsure if it will work, but I have wondered if building out the dependencies to install right next to the DSO,
andgiving the DSO an rpath of “.” would achieve the effect we are looking for. 

Given the security issues with library paths, I’d guess that relative paths are verboten. But also, Postgres does not
`cd`into an extension directory before loading it, AFAIK. 

> It’s unfortunate (DY)LD_LIBRARY_PATH is dead and dying, but there we are. The trouble I see with somehow coercing the
systemto load a local copy of system libraries is for (a) common system libs that PostgreSQL itself might be linking
(libssl,for example) that then will end up with symbol collisions between the copy loaded by postgres and the copy
loadedby the DSO and (b) same thing but for different extensions with the same dependencies.  

Yeah, this is why people tend to depend on system dependencies loaded from well-known paths, so libssl will always load
thesame DSO. I imagine the use of LD_LIBRARY_PATH can cause issues today. 

> I guess I cannot shake the idea that a lot of interesting extensions are going to have interesting system
dependencies,that “exposing an interesting library to postgres” has a high value for an integration system like
PostgreSQL. 

Yeah, I think the issue will be to figure out how to manage OS package-provided system dependencies in immutable
environmentslike a Docker container. I suspect some combination of -rpath compiled into Postgres and mounting
individualDSO files not included in the base image will be the way to go. 

Best,

David

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 20:11:43

On Oct 29, 2024, at 13:03, David E. Wheeler <david@justatheory.com> wrote:

> That’s fine for Linux, but more challenging for macOS and Windows. It’s also an issue that the apt and yum
repositories,while having a lot of stuff, don’t have all extensions. 

Sorry, I think I was too quick to respond there. To the degree possible, it makes sense to me that one would try to
dependon DSO packages provided by OS packaging systems. For non-Linux systems, that might mean requiring a third-party
packagingsystem like Homebrew (macOS) or Chocolatey (Windows). 

The trick will be how to add such dependencies to a Docker container at runtime.

Best,

David

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 20:16:21

On Oct 29, 2024, at 12:23, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

> Question for the more knowledgable, how are binary distribution systems like Conda and others shipping DLLs such that
differentpackages don’t clobber each other? 

I’m not familiar with Conda, but from its docs[1], it seems to rely on a value compiled into an app:

> *   On Linux, the $ORIGIN variable allows you to specify "relative to this file as it is being executed".
> *   On macOS, the variables are:
>     *   @rpath---Allows you to set relative links from the system load paths.
>     *   @loader_path---Equivalent to $ORIGIN.
>     *   @executable_path---Supports the Apple .app directory approach, where libraries know where they live relative
totheir calling application. 


Thinks are a bit more complicated on Windows, which doesn’t support something like -rpath.

D

[1]: https://docs.conda.io/projects/conda-build/en/latest/resources/use-shared-libraries.html

Re: RFC: Extension Packaging & Lookup

From

Paul Ramsey

Date:

29 October 2024, 20:16:54

On Oct 29, 2024, at 10:09 AM, David E. Wheeler <david@justatheory.com> wrote:

On Oct 29, 2024, at 12:23, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

Thanks for this, David,

🤘🏻

This of course is the area that worries the heck out of me, as someone with extensions that includes not just single system dependencies but long chains of them (depending on GDAL draws in a huge tree).

Yeah. I cited pgsql-http as a simple place to start, on the assumption that once we figure out how to properly configure things for one DSO, it the pattern should work for any of them in a tree.

An apposite choice, since it not only demonstrates depending on a common system library, it also demonstrates the way these things loop on each other, as curl then depends on libssl, which postgres also depends on.

I’m unsure if it will work, but I have wondered if building out the dependencies to install right next to the DSO, and giving the DSO an rpath of “.” would achieve the effect we are looking for.

Given the security issues with library paths, I’d guess that relative paths are verboten. But also, Postgres does not `cd` into an extension directory before loading it, AFAIK.

Relative rpaths as I have seen them are relative to the executable or library in which they are defined (as far as I know, I’m not a dylib expert by any stretch). The implication is that extension.so could have an rpath=. and dependent dylibs sitting next to it. This is how, for example, cmake out-of-tree builds can run tests against the newly built library before it’s installed,.. the test execs have an rpath of ../lib on them.

It’s unfortunate (DY)LD_LIBRARY_PATH is dead and dying, but there we are. The trouble I see with somehow coercing the system to load a local copy of system libraries is for (a) common system libs that PostgreSQL itself might be linking (libssl, for example) that then will end up with symbol collisions between the copy loaded by postgres and the copy loaded by the DSO and (b) same thing but for different extensions with the same dependencies.

Yeah, this is why people tend to depend on system dependencies loaded from well-known paths, so libssl will always load the same DSO. I imagine the use of LD_LIBRARY_PATH can cause issues today.

I guess I cannot shake the idea that a lot of interesting extensions are going to have interesting system dependencies, that “exposing an interesting library to postgres” has a high value for an integration system like PostgreSQL.

Yeah, I think the issue will be to figure out how to manage OS package-provided system dependencies in immutable environments like a Docker container. I suspect some combination of -rpath compiled into Postgres and mounting individual DSO files not included in the base image will be the way to go.

Do you see immutable images as the main deployment path going forward? I’m not container-pilled yet, so it still feels like a niche concern. Meanwhile being able to “add an extension that my cloud provider doesn’t support yet” feels quite vital.

ATB,

Re: RFC: Extension Packaging & Lookup

From

"Tristan Partin"

Date:

29 October 2024, 20:40:06

On Tue Oct 29, 2024 at 12:03 PM CDT, David E. Wheeler wrote:
> On Oct 29, 2024, at 12:51, Christoph Berg <myon@debian.org> wrote:
>
>> I think this is where the whole idea of "provide binaries outside of
>> deb/rpm" is just going to die. You are trying to reinvent a wheel
>> that has been running well for decades, including lots of production
>> systems. I don't know anyone who would trust that new source of
>> binaries that doesn't integrate into their OS packaging system.
>
> That’s fine for Linux, but more challenging for macOS and Windows.
> It’s also an issue that the apt and yum repositories, while having
> a lot of stuff, don’t have all extensions.

Hey David,

I haven't worked on Linux packaging in a while, so take my input with
a grain of salt. Could we make distro packaging easier for extension
developers and take some of the load off of the packaging team?

I would imagine this workflow to be implemented as:

    curl -X POST https://extensions.postgresql.org/package \
        -H 'Content-Type: application/json' \
        -d '{
            "extension": "pgvector",
            "tarball": "https://path.to.source.tarball",
            "build-system": "meson",
            "postgres-versions": [
                14,
                15,
                16
            ],
        }'

The backend would create the packages and publish them to the various
repositories. We would probably need to come up with a dependency
manifest that listed both build and runtime dependencies.

This would need some massaging, and has various caveats like require
using a well-known build system like PGXS or meson. There are probably
security implications that need to be worked through. The packaging team
could maybe have some burden lifted off their shoulders.

Is that something people would be interested in? As someone who writes
software, I largely find reaching the distribution channels is always
the hardest part.

--
Tristan Partin
Neon (https://neon.tech)

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 20:47:33

On Oct 29, 2024, at 13:40, Tristan Partin <tristan@partin.io> wrote:

> The backend would create the packages and publish them to the various repositories. We would probably need to come up
witha dependency manifest that listed both build and runtime dependencies. 
>
> This would need some massaging, and has various caveats like require using a well-known build system like PGXS or
meson.There are probably security implications that need to be worked through. The packaging team could maybe have some
burdenlifted off their shoulders. 
>
> Is that something people would be interested in? As someone who writes software, I largely find reaching the
distributionchannels is always the hardest part. 

Yes, I’m hoping to provide the infrastructure to enable a pattern like this as part of the PGXN v2 project. Some
detailsfrom the Architecture doc[1]. 

However, I think this is a bit off-topic for this thread, where I’d like to try to account for issues to be addressed
bythe proposed extension directory structure and search path. 

Best,

David

[1]: https://wiki.postgresql.org/wiki/PGXN_v2/Architecture#Packaging

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 20:55:23

On Oct 29, 2024, at 13:16, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

> An apposite choice, since it not only demonstrates depending on a common system library, it also demonstrates the way
thesethings loop on each other, as curl then depends on libssl, which postgres also depends on. 

Ooh, yeah, vicious!

> Relative rpaths as I have seen them are relative to the executable or library in which they are defined (as far as I
know,I’m not a dylib expert by any stretch). The implication is that extension.so could have an rpath=. and dependent
dylibssitting next to it. This is how, for example, cmake out-of-tree builds can run tests against the newly built
librarybefore it’s installed,.. the test execs have an rpath of ../lib on them. 

Oh that’s super interesting. Will be worth trying. I was hoping to avoid having to set rpath in every extension,
though,but maybe that’d be the way forward. 

The issue, though, is a tree of dependencies. Would you really want to include both libcurl and OpenSSL in a pgsql-http
binarydistribution package --- especially since Postgres itself will be using its own OpenSSL package? To Christophe’s
point,I think we might want to delegate the provisioning of dependency DSOs to the system package manager. 

>> Yeah, I think the issue will be to figure out how to manage OS package-provided system dependencies in immutable
environmentslike a Docker container. I suspect some combination of -rpath compiled into Postgres and mounting
individualDSO files not included in the base image will be the way to go. 
>
> Do you see immutable images as the main deployment path going forward? I’m not container-pilled yet, so it still
feelslike a niche concern. Meanwhile being able to “add an extension that my cloud provider doesn’t support yet” feels
quitevital. 

I think it will become increasingly common. At first I thought it was just Docker/Kubernetes --- and in fairness, a lot
oforgs are running Postgres in such environments these days. Some Postgres as a Service companies exclusively use
Kubernetes(my employer, Tembo, is one). 

But then Tobias Bussmann pointed out[1] that Postgres.app for macOS has the same issue: it has to be immutable in order
tovalidate the app via Apple’s provisioning certificate. If you change the contents of the Postgres.app bundle, you can
nolonger run it! It would not surprise me if there were other examples, and that the pattern becomes increasingly
common.

Best,

David

[1]: https://justatheory.com/2024/03/mini-summit-two/

Re: RFC: Extension Packaging & Lookup

From

Paul Ramsey

Date:

29 October 2024, 21:03:19

On Oct 29, 2024, at 10:55 AM, David E. Wheeler <david@justatheory.com> wrote:

Relative rpaths as I have seen them are relative to the executable or library in which they are defined (as far as I know, I’m not a dylib expert by any stretch). The implication is that extension.so could have an rpath=. and dependent dylibs sitting next to it. This is how, for example, cmake out-of-tree builds can run tests against the newly built library before it’s installed,.. the test execs have an rpath of ../lib on them.

Oh that’s super interesting. Will be worth trying. I was hoping to avoid having to set rpath in every extension, though, but maybe that’d be the way forward.

The issue, though, is a tree of dependencies. Would you really want to include both libcurl and OpenSSL in a pgsql-http binary distribution package --- especially since Postgres itself will be using its own OpenSSL package? To Christophe’s point, I think we might want to delegate the provisioning of dependency DSOs to the system package manager.

Trouble is, the extension and the library it depends upon are quite tightly linked at build time. During build, the extension will be looking at the library version to determine what features to enable, and to know what functions it can call. Frequently a whole extra SQL function might depend on a new feature in the library, and be #ifdef’ed out if the library is too old. An extension built against the latest library, but installed against an earlier one will error out quite quickly when it finds it has symbols that cannot be resolved in the library it has dylinked to.

At that point you’re better off distributing the extension via the packaging system, where you know that all the dependency versions line up correctly.

ATB,

Re: RFC: Extension Packaging & Lookup

From

Christoph Berg

Date:

29 October 2024, 21:09:46

Re: Tristan Partin
> This would need some massaging, and has various caveats like require using a
> well-known build system like PGXS or meson. There are probably security
> implications that need to be worked through. The packaging team could maybe
> have some burden lifted off their shoulders.

So far, no one has approached me ("the packaging team") about which
problems I need solved with extensions (apart from the PGSHAREDIR
issue).

> Is that something people would be interested in? As someone who writes
> software, I largely find reaching the distribution channels is always the
> hardest part.

Well, talk to me and Devrim.

Christoph

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 21:11:09

On Oct 29, 2024, at 14:03, Paul Ramsey <pramsey@cleverelephant.ca> wrote:

> At that point you’re better off distributing the extension via the packaging system, where you know that all the
dependencyversions line up correctly. 

Yeah. Perhaps it could be mitigated to some degree by requiring a minimum version of each dependency in the binary
distribution.But that could get a bit tricky/fussy. This is why, I think, people recommend sticking to the system
packagingsystem exclusively. 

At any rate, all of this is somewhat tangential to the directory structure/search path functionality at the heart of
thisproposal. Independent of how things are compiled and packaged, the lookup structure should be reasonable. Perhaps
wecan keep brainstorming about the DSO dependency issues in another thread or on Slack or something. What do you think? 

Cause I’m DOWN to keep working on it, but don’t want to obfuscate the main reason for THIS thread.

Best,

David

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

29 October 2024, 21:12:49

On Oct 29, 2024, at 14:09, Christoph Berg <myon@debian.org> wrote

> So far, no one has approached me ("the packaging team") about which
> problems I need solved with extensions (apart from the PGSHAREDIR
> issue).
>
>> Is that something people would be interested in? As someone who writes
>> software, I largely find reaching the distribution channels is always the
>> hardest part.
>
> Well, talk to me and Devrim.

So far, most of the ideas I’ve brought up with Devrim and you --- like packaging Go and Rust/PGRX extensions, or
packagingpre-compiled binaries --- have been met with pretty clear “no”s. Understandably! So I personally have deferred
mentioningmuch more until I have some more explicit ideas. :-) 

Best,

David

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

31 October 2024, 22:41:27

Fellow Humans,

I’m working on an updated proposal with more detail, and more comprehensive. But I keep getting a bit caught up on this
bit:

On Oct 28, 2024, at 18:19, David E. Wheeler <david@justatheory.com> wrote:

>> * Binary-only extensions might also be installed here; the difference is they have no control file. The LOAD
commandand shared_preload_libraries would need to know to look here, too.
>
> Or perhaps we should require a control file for these, too, but add a “type” key or some such? Maybe such a shared
modulecould be supported by CREATE EXTENSION, as well as, but not include SQL files?

I’m trying to imagine how this ought to work. The challenge is that, with the layout I propose here, shared module
fileswill no longer always be in `$dynamic_library_path`, but in any `$extension/pkglib` subdirectory of each
subdirectoryof `extension_path`, as well. Is that desirable?

Let’s say we want to load a module named “semver” that’s included in the semver extension. With the proposed
organizationup-thread, the module would be installed in:

```
$extdir_user/semver/pkglib/semver.(so|dylib|dll|etc)
```

What should be passed to preload/LOAD to load it? A few options:

Option 1
--------

* Specify the module name “semver” in the `LOAD` command or in
`*_preload_libraries` (same as in 17 and earlier)
* Teach the preload/LOAD code to search for the module file in `*/pkglib/`
under each extension path

Pros:

* Follows the existing module name specification in preload/LOAD

Cons:

* Potentially huge number of directories to search, when lots of extension
are installed.
* Depending on search order, the wrong module may be loaded if two
extensions have a module file with the same name

Option 2
--------

* Specify the module name to include the extension name. Perhaps something
like `$extension:$module`.
* Teach the preload/LOAD code to detect the extension name as part of the
command and only look for the DSO in that extension's `pkglibdir`.

Pros:

* Searches at most the list of directories in the `extension_path`.
* No conflicts with any other module files from other extensions.

Cons:

* Overloads the meaning of preload/LOAD strings, which might be confusing to
some.
* Upgrades might need these values to change from the old to the new syntax.

Other Options?
--------------

I kind of like Option 2, as it would allow us to eventually support non-`CREATE EXTENSION` modules as extensions, too.
Iimagine distributing, say `auto_explain` in an extension directory of its own, with a `auto_explain.control` file that
identifiesit as a LOAD-only extension. Then specifying `auto_explain:auto_explain` would work as expected. Or perhaps
just`auto_explain:` could load *all* the modules included in the auto_explain "extension".

But then maybe I'm starting to talk myself into arguing that `LOAD` ought to be deprecated, `CREATE EXTENSION` could
supportnon-SQL extensions, and the `*preload*` GUCs would contain a list of extensions rather than module files.

But I digress. Any ideas about other options to address this design challenge?

Thanks,

David

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

31 October 2024, 23:12:44

On Oct 31, 2024, at 15:41, David E. Wheeler <david@justatheory.com> wrote:

> Other Options?
> --------------
>
> I kind of like Option 2, as it would allow us to eventually support non-`CREATE EXTENSION` modules as extensions,
too.I imagine distributing, say `auto_explain` in an extension directory of its own, with a `auto_explain.control` file
thatidentifies it as a LOAD-only extension. Then specifying `auto_explain:auto_explain` would work as expected. Or
perhapsjust `auto_explain:` could load *all* the modules included in the auto_explain "extension". 
>
> But then maybe I'm starting to talk myself into arguing that `LOAD` ought to be deprecated, `CREATE EXTENSION` could
supportnon-SQL extensions, and the `*preload*` GUCs would contain a list of extensions rather than module files. 
>
> But I digress. Any ideas about other options to address this design challenge?

I just thought of another one:

Option 3
--------

* Add a new preload GUCs for extensions: `shared_preload_extensions`,
`session_preload_extensions`, etc.
* Specify just extension names for these GUCs, instead of module file names.
* Leave `LOAD/*preload_libraries` unchanged.
* Have it search the `extension_path` directories just as `CREATE EXTENSION`
does.

Pros:

* The behaviors of `LOAD/*preload_libraries` are unchanged
* Provide a more future-looking interface, where we perhaps eventually
deprecate `LOAD/*preload_libraries` in favor of shipping all modules as
extensions.

Cons:

* More GUCs!
* The new GUCs load *all* of the modules included in an extension, rather
than specific ones. But maybe we can borrow the `$extension:module` syntax
from Option 2 to support loading a single extension
* Upgrades from Postgres 17 would still need any extension modules loaded
in `*preload_libraries` moved to the new GUCs, since the files will live
in a new location

I kind of like this one…

D

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

07 November 2024, 18:30:35

Hackers,

On Oct 31, 2024, at 16:12, David E. Wheeler <david@justatheory.com> wrote:

> I just thought of another one:

Last week I tried to integrate all the ideas into this thread and the previous[1] into a single proposal that attempts
towork through all the implications and issues. I’ve drafted it as a blog post[2] and plan to publish it next week,
followingsome more feedback. Would appreciate comments, corrections, and any other general feedback: 

  https://github.com/theory/justatheory/pull/7

Best,

David

[1]: https://postgr.es/m/E7C7BFFB-8857-48D4-A71F-88B359FADCFD@justatheory.com
[2]: https://github.com/theory/justatheory/pull/7

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

14 November 2024, 00:38:26

On Nov 7, 2024, at 10:30, David E. Wheeler <david@justatheory.com> wrote:

> Last week I tried to integrate all the ideas into this thread and the previous[1] into a single proposal that
attemptsto work through all the implications and issues. I’ve drafted it as a blog post[2] and plan to publish it next
week,following some more feedback. Would appreciate comments, corrections, and any other general feedback: 
>
>  https://github.com/theory/justatheory/pull/7

Got some good feedback on this, in particular about how I might be overthinking separate destinations for core vs.
package-installedvs. user-installed extensions. The RFC proposes separate directories and variables for
core/vendor/siteextensions, borrowing the idea from various dynamic language systems. 

I came to this thinking that it was important to keep core (contrib, PL) extensions separate from non-core extensions,
andif so, it’d be useful to have other defaults so that `make install` would go to the right one (site by default). 

But maybe it’s not necessary? If there are no extensions by default, perhaps it doesn’t matter?

But of course there are some by default. I just ran `make all && make install`, and `share/extension` contains files
forplpgsql (and plperl, but I presume it could be separated). Meanwhile, `lib` is full of a _ton_ of files. 

Would it not be beneficial to have by-default empty directories into which extensions and modules are installed that
don’tcome with core? Or should they *all* be considered one thing? If the latter, perhaps truly core modules should be
storedseparately? 

Best,

David

Re: RFC: Extension Packaging & Lookup

From

"David E. Wheeler"

Date:

26 November 2024, 00:45:59

On Nov 13, 2024, at 16:38, David E. Wheeler <david@justatheory.com> wrote:

> I came to this thinking that it was important to keep core (contrib, PL) extensions separate from non-core
extensions,and if so, it’d be useful to have other defaults so that `make install` would go to the right one (site by
default).
>
> But maybe it’s not necessary? If there are no extensions by default, perhaps it doesn’t matter?

I ripped out the core/vendor/site stuff and have now published a fairly detailed RFC.

  https://justatheory.com/2024/11/rfc-extension-packaging-lookup/

What do you think, Hacker Friends?

Best,

David