Re: RFC: Extension Packaging & Lookup - Mailing list pgsql-hackers

From David E. Wheeler
Subject Re: RFC: Extension Packaging & Lookup
Date
Msg-id 0D3FE0EA-0E87-42BD-AB15-D3549BFD0F4C@justatheory.com
Whole thread Raw
In response to RFC: Extension Packaging & Lookup  ("David E. Wheeler" <david@justatheory.com>)
Responses Re: RFC: Extension Packaging & Lookup
List pgsql-hackers
Greetings Postgres humans,

There was much discussion of this proposal at PGConf.eu <http://pgconf.eu/> last week, between Gabriele Bartolini,
PeterEisentraut, Christoph Berg, and Andres Freund (all Cc’d here), and me, among others. We agreed, in principle, to
anapproach to this feature. Overall I think the proposal doesn’t need to change, but there are a couple of things to
tweak,and I’ve added a list of use cases I’m aware of below, plus a tangent on the challenges of loading system DOS. 

Quoting a lot and responding inline.

On Oct 10, 2024, at 4:34 PM, David E. Wheeler <david@justatheory.com> wrote:

> I guess I should get off my butt and do it. So let’s do this. Here’s what I propose.
>
> *   When an extension is installed, all of its files should live in a single directory. These include:
>
>    *   The Control file in directory describes extension
>    *   Subdirectories for SQL, shared libraries, docs, binaries
>        (also locales and tsearch dictionaries?)

Just to be clear, these directories correspond to the `pg_config `--*dir` options, excluding include directories:

```
❯ pg_config --help | grep 'dir\b' | grep -v include
  --bindir              show location of user executables
  --docdir              show location of documentation files
  --htmldir             show location of HTML documentation files
  --libdir              show location of object code libraries
  --pkglibdir           show location of dynamically loadable modules
  --localedir           show location of locale support files
  --mandir              show location of manual pages
  --sharedir            show location of architecture-independent support files
  --sysconfdir          show location of system-wide configuration files

```

But perhaps also excluding --sysconfdir?

> *   Next, there should be an extension lookup path. The first item in the path is the compile-time default, and
ideallywould include only core extensions. Subsequent paths would be set by a GUC, similar to dynamic_library_path, but
onlyfor extensions (including their shared libraries). 

Let’s call it extension_path.

I also suggest adding two new pg_config options, for the directory containing core extensions, and a second for system
oruser extensions. Something like: 

  --extension-dir      show location of core extensions
  --extension-dir-user show location of user extensions

The default value for the `extension_path` GUC would be, assuming some new template variables:

    extension_path = '$userextdir,$extdir'

This will allow installers (PGXS) to know where to install non-core extensions without bothering the user about it.

> *   Modify PGXS (or create a new installer CLI used by PGXS?) to install an extension according to this pattern.
Allowthe specification of a prefix. This should differ from the current `PREFIX`, in that the values of `sharedir`,
`pkglibdir`,etc. would not be fully-duplicated under the prefix, but point to a directory used in the extension path.
Forexample, when installing an extension need “pair", something like 
>
>        make install BASE_DIR=/opt/pg/extension
>
>    Would create `/opt/pg/extension/pair`, rather than `/opt/pg/extension/$(pg_config --sharedir)/extension/pair`.
>
> *   Perhaps there could also be an option to symlink binary files or man pages to keep paths simple.
>
> *   For CREATE EXTENSION, Postgres would look for an extension on the file system by directory name in each of the
extensionpaths instead of control file name. It would then find the control file in that directory and the necessary
SQLand shared library files in the `sql` and `lib` subdirectories of that directory. 

In discussion, I think we clarified that it should look for $extension/$extension.control.

> *   Binary-only extensions might also be installed here; the difference is they have no control file. The LOAD
commandand shared_preload_libraries would need to know to look here, too. 

Or perhaps we should require a control file for these, too, but add a “type” key or some such? Maybe such a shared
modulecould be supported by CREATE EXTENSION, as well as, but not include SQL files? 

> The basic idea, then, is three-fold:
>
> 1.  This pattern is more like a packaging pattern than CREATE EXTENSION-specific, since it includes other types of
extensions
>
> 2.  All the files for a given extension live within a single directory, making it easier to reason about what’s
installedand what’s not. 
>
> 3.  These extension packages can live in multiple paths.

For dupes, the first one found in the list of extension_path directories is the one that Postgres will load.

We also discussed including the version in the directory name, so that multiple versions could be installed at once.
Notsure how Postgres would pick the right one, though. 

> Some examples. Core extensions, like citext, would live in, say, $(pg_config --extensiondir)/citext), and have a
structuresuch as: 
>
> ```
> citext
> ├── citext.control
> ├── lib
> │   ├── citext.dylib
> │   └── bitcode
> │   ├── citext
> │   │   └── citext.bc
> │   └── citext.index.bc
> └── sql
>    ├── citext--1.0--1.1.sql
>    ├── citext--1.1--1.2.sql
>    ├── citext--1.2--1.3.sql
>    ├── citext--1.3--1.4.sql
>    ├── citext--1.4--1.5.sql
>    ├── citext--1.4.sql
>    └── citext--1.5--1.6.sql
> ```
>
> Third-party extensions would live in one or more other directories on the file system, unknown at compile time, but
setin the extension path GUC and accessible to/owned by the Postgres system user. Let’s say we set `/opt/pgxn` as one
ofthe paths. Within that directory, we might have a directory for a pure SQL extension in a a directory named “pair”
thatlooks like this: 
>
> ```
> pair
> ├── LICENSE.md
> ├── README.md
> ├── pair.control
> ├── doc
> │   ├── html
> │   │   └── pair.html
> │   └── pair.md
> └── sql
> ├── pair--1.0--1.1.sql
> └── pair--1.1.sql
> ```
>
> A binary application like pg_top would live in the pg_top directory, structured something like:
>
> ```
> pg_top
> ├── HISTORY.rst
> ├── INSTALL.rst
> ├── LICENSE
> ├── README.rst
> ├── bin
> |   └── pg_top
> └── doc
>    └── man
>        └── man3
>            └── pg_top.3
> ```
>
> And a C extension like semver would live in the semver directory and be structured something like:
>
> ```
> semver
> ├── LICENSE
> ├── README.md
> ├── semver.control
> ├── doc
> │   └── semver.md
> ├── lib
> │   ├── semver.dylib
> │   ├── bitcode
> │   └── semver
> │   │   └── semver.bc
> │   └── semver.index.bc
> └── sql
>    ├── semver--1.0--1.1.sql
>    └── semver--1.1.sql
> ```

Another example: a binary-only extension loaded via LOAD (or *_preload_libraries), and not `CREATE EXTENSION`, like
auto_explain:

```
auto_explain
├── auto_explain.control
└── lib
    ├── auto_explain.dylib
    ├── bitcode
    └── auto_explain
    │   └── auto_explain.bc
    └── auto_explain.index.bc
```

I’ve included the control file, as suggested above, as a way to manage *all* extensions, not just `CREATE EXTENSION`
extensions.A non-core extension would be the same, but might include other files like a README, LICENSE, etc. 

One wrinkle: Some extensions, such as pg_hint_plan[2], include both a `CREATE EXTENSION` extension and a `LOAD` module,
andboth have the same name. Not sure how best to adapt for such a case to the proposal to include a control file for
bothtypes of extension --- because both would have the same control file name. My proposal is that names would be
uniquebetween both `CREATE EXTENSION` and `LOAD` extensions, in which case one or the other extension would need to be
renamed(probably the `LOAD` extension). Then perhaps the `CREATE EXTENSION` extension could declare the `LOAD`
extensionas a dependency. 

## Use Cases

Here’s how the proposed file layout and extension_path feature would work for the use cases that have driven it.

### Apt/Yum testing

Rather than patching Postgres to look up pg_config directories under a prefix[3], a packager who wants to run tests and
thereforeneeds to install an extension where Postgres can find it without writing to the installed server would follow
thesesteps: 

*   Set the extension_path GUC to search the package DESTDIR.
*   Install the extension into that directory: `make install BASE_DIR=$DESTDIR`
*   Run `make installcheck`

This should allow Postgres to find and load the extension during the tests. The Postgres installation will not have
beenmodified, only the extension_path will have been changed. 

### Postgres.app

The contents of the macOS Postgres.app bundle must be immutable in order to validate against the signature generated by
anApple-provided certificate. In order to allow extensions to be installed without changing the app bundle, the app
wouldeither: 

1.  Ship with an extension_path pointing to a directory outside the bundle, and then users have to know what this
directoryis when `make install`ing an extension; or 

2.  Ship with the --extension-dir-user pg_config value described above pointing to a directory outside the bundle, into
whichall non-core extensions would be `make install`ed into. 

### Docker/Kubernetes

Like Postgres.app, Docker images are immutable, but unlike Postgres.app, they represent the entire system. The solution
isidentical to that for Postgres.app, except that instead of installing extensions into --extension-dir-user, they
wouldbe mounted as volumes in that directory. 

So, instead of `make install`ing there, a Kubernetes pod can be configured to mount a volume for each extension. Need
toadd a new extension to a Pod? Just mount a volume for it that contains the necessary files, as described in the
examplesabove. 

One wrinkle: Some Kubernetes providers limit the number of volumes that can be mounted[4]. If someone needed more
volumesthan that, one would need to adopt a different pattern. Perhaps there could be one volume for the
extension-dir-user,and an external service could add any and all necessary extensions to it? 

## Challenge: Third Party Dependencies

Some extensions require third-party dependencies, usually provided by the OS. For example, pgsql-http[5] compiles into
aDSO that dynamically requires another DSO, libcurl. Finding and loading of such dependencies is handled by the OS, not
byPostgres. This configuration is an annoyance on most systems, where the user has to figure out what dependencies to
installfrom their OS package manager in order to get it to work. 

However, it presents a greater challenge for immutable conditions, as in Docker and Kubernetes containers. How can
systemdependencies be added without breaking immutability? There are a few options: 

1.  Just include all likely dependencies in the base image. This works today, but it would be preferable not to include
additionaldependencies that may never be used. It also can unnecessarily bloat an image. Also just kinda gross. 

2.  Mount a volume with all of the default base image DSOs, then have an external process add DSOs to it when required
fora new extension. This also might work today, although it requires coding that external process (e.g., a Kubernetes
operator).

3.  Mount a second volume for non-base image DSOs, and again have an external process put them there when needed, but
thenuse some method to tell the OS where to find them, since they won’t be in the default location. More on that below. 

4.  Mount individual DSO files as volumes[6] as needed in the system shared lib directory where the OS can find them.
Thewrinkle here is the mount limitation imposed by some providers, detailed above. 

For solutions that require installing a DSO outside the default directories that the system is aware of, one needs a
wayto tell the system where to find them. There are two basic methods for doing so: 

1.  Set LD_LIBRARY_PATH to the directory in which third-party DSOs are installed. Today, however, this is considered an
insecurepattern[7]. It doesn’t work at all on macOS, for example, unless you disable SIP[8]. Few will do so, nor should
they.Andres Freund reports that it’s on its way out on Linux, too. So perhaps some can do this, but it sounds as if
LD_LIBRARY_PATH’sdays are numbered. 

2.  Use `-rpath` when compiling the DSOs to point to the proper place. This embeds the path in the DSO, so it always
looksin the same place. However, this requires that the DSO be recompiled for every variant of the `-rpath`, which
createschallenges for non-path specific binary packaging --- or if an OS vendor changes the director. But perhaps
Postgresitself could be compiled with an `-rpath` that will be used when loading extensions, so the extension DSOs
themselvesdon’t have to know the path? Then the base image just needs to be compiled with that option. 

## Comments and Corrections

I think I captured most of the issues we discussed at PGConf.eu <http://pgconf.eu/> last week; please correct any
misunderstandings,inaccuracies, and oversights you spot! And are there any other issues you can think of with the
overallapproach? 

Thanks for reading to the end!

Best,

David

[1]: https://www.postgresql.org/message-id/D30A91FA-A6D4-4737-941F-0BBB2984B730%40justatheory.com
[2]: https://github.com/ossc-db/pg_hint_plan/
[3]: https://commitfest.postgresql.org/50/4913/
[4]: https://superuser.com/a/1603150/285886
[5]: https://github.com/pramsey/pgsql-http
[6]: https://stackoverflow.com/a/42260979/79202
[7]: http://xahlee.info/UnixResource_dir/_/ldpath.html
[8]: https://developer.apple.com/documentation/security/disabling-and-enabling-system-integrity-protection




pgsql-hackers by date:

Previous
From: Diego Fronza
Date:
Subject: Re: Conflict Detection and Resolution
Next
From: Peter Smith
Date:
Subject: Re: define pg_structiszero(addr, s, r)