Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKGL4VZRpP3CkjYQkv4RQ6pRYkPkSNgKSxFBwciECQ0mEuQ@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
On Wed, Jun 8, 2022 at 8:16 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Earlier I mentioned distinct "providers" but I take that back, that's
> > too complicated.  Reprising an old idea that comes up each time we
> > talk about this, this time with some more straw-man detail: what about
> > teaching our ICU support to understand "libicu18n.so.71:en" to mean
> > that it should dlopen() that library and use its functions?  Or some
> > cleverer, shorter notation.  Then it's the user's problem to make sure
> > the right libraries are installed, and it'll fail if they're not.  For
> > example, on Debian bookworm right now you can install libicu63,
> > libicu67, libicu71, though only the "current" -dev package, but which
> > I'm sure we can cope with.  You're at the mercy of the distro or
> > add-on package repos to keep a lot of versions around, but that seems
> > OK.
>
> Right. Postgres could link to multiple versions of ICU at the same
> time. Right now it doesn't, and right now the ICU C symbol names that
> we use are actually versioned (this isn't immediately apparent because
> the C preprocessor makes it appear that ICU symbol names are generic).

Yeah, it's possible to link against multiple versions in theory and
that might be a way to do it if we were shipping our own N copies of
ICU like DB2 does, but that's hard in practice for shared libraries on
common distros (and vendoring or static linking of such libraries was
said to be against many distros' rules, since it would be a nightmare
if everyone did that, though I don't have a citation for that).  I
suspect it's better to use dlopen() to load them, because (1) I
believe that the major distros only have -dev/-devel packages for the
"current" version, even though they let you install the packages
containing the .so files for multiple versions at the same time so
that binaries linked against older versions keep working and (2) I
think it'd be cool if users were free to find more ICU versions in
add-on package repos and be able to use them to get a version that the
packager of PostgreSQL didn't anticipate.

> We could perhaps invent a new indirection that knows about
> multiple ICU versions, each of which is an independent collation
> provider, or maybe a related collation provider that gets used by
> default on REINDEX. ICU is designed for this kind of thing. That
> approach more or less puts packagers on the hook for managing
> collation stability. But now long term collation stability is at least
> feasible -- we at least have a coherent strategy. In the worst case
> the community .deb and .rpm repos might continue to support an older
> ICU version, or lobby for its continued support by the distro (while
> actively discouraging its use in new databases). This isn't the same
> thing as forking ICU. It's a compromise between that extreme, and
> the current situation.

Yeah, I've flip-flopped a couple of times on the question of whether
ICU63 and ICU67 should be different collation providers, or
individual collations should somehow specify the library they want to
use (admittedly what I showed above with a raw library name is pretty
ugly and some indirection scheme might be nice).  It would be good to
drill into the pros and cons of those two choices.  As for getting
sane defaults, I don't know if this is a good idea, but it's an idea:
perhaps schemas and search paths could be used,  you avoid having to
include ugly version strings in the collation identifiers, and the
search path effectively controls default when you don't want to be
explicit (= most users)?



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers