Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Collation version tracking for macOS
Date
Msg-id CAH2-Wz=23GAPCdVRwbDUSgP9Ec1X1eDu4Z+4hshUMfK_gSniFw@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
On Tue, Jun 7, 2022 at 3:27 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Yeah, it's possible to link against multiple versions in theory and
> that might be a way to do it if we were shipping our own N copies of
> ICU like DB2 does, but that's hard in practice for shared libraries on
> common distros (and vendoring or static linking of such libraries was
> said to be against many distros' rules, since it would be a nightmare
> if everyone did that, though I don't have a citation for that).

I'm not saying that it's going to be easy, but I can't see why it
should be impossible. I use Debian unstable for most of my work. It
supports multiple versions of LLVM/clang, not just one (though there
is a virtual package with a default version, I believe). What's the
difference, really?

Packaging standards certainly matter, but they're not immutable laws
of the universe. It seems reasonable to suppose that the people that
define these standards would be willing to hear us out -- this is
hardly a trifling matter, or something that only affects a small
minority of *their* users.

We don't need to support a huge number of versions on each OS -- just
enough to make it feasible for everybody to avoid the need to ever
reindex every index on a collatable type (maybe ICU versions that were
the default for the last several major versions of the OS are
available through special packages). We don't necessarily have to have
a hard dependency on every supported version from the point of view of
the package manager. And all of this would ultimately be the
responsibility of each individual packager; they'd need to figure out
how to make it work within the context of the platform that they're
targeting. We'd facilitate that important work, but would defer to
them on the final details. There could be a hands-off approach to the
whole thing, so it wouldn't be a total departure from what we do
today.

> Yeah, I've flip-flopped a couple of times on the question of whether
> ICU63 and ICU67 should be different collation providers, or
> individual collations should somehow specify the library they want to
> use (admittedly what I showed above with a raw library name is pretty
> ugly and some indirection scheme might be nice).  It would be good to
> drill into the pros and cons of those two choices.

I think that there are pretty good technical reasons why each ICU
version is tied to a particular version of CLDR. Implementing CLDR
correctly and efficiently is a rather difficult process, even if we
ignore figuring out what natural language rules make sense. And so
linking to multiple different ICU versions doesn't really seem like
overkill to me. Or if it is then I can easily think of far better
examples of software bloat. Defining "stable behavior for collations"
as "uses exactly the same software artifact over time" is defensive
(compared to always linking to one ICU version that does it all), but
we have plenty that we need to defend against here.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers
Next
From: Thomas Munro
Date:
Subject: Re: Collation version tracking for macOS