Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKG+PNqUn5oG6hFgPcy7AyxuSbpNKo-u=Bobe=dn7k8sVZw@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers
 On Wed, Jun 8, 2022 at 10:59 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Jun 7, 2022 at 3:27 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Yeah, it's possible to link against multiple versions in theory and
> > that might be a way to do it if we were shipping our own N copies of
> > ICU like DB2 does, but that's hard in practice for shared libraries on
> > common distros (and vendoring or static linking of such libraries was
> > said to be against many distros' rules, since it would be a nightmare
> > if everyone did that, though I don't have a citation for that).
>
> I'm not saying that it's going to be easy, but I can't see why it
> should be impossible. I use Debian unstable for most of my work. It
> supports multiple versions of LLVM/clang, not just one (though there
> is a virtual package with a default version, I believe). What's the
> difference, really?

The difference is that Debian has libllvm-{11,12,13,14}-dev packages,
but it does *not* have multiple -dev packages for libicu, just a
single libicu-dev which can be used to compile and link against their
chosen current library version.  They do have multiple packages for
the actual .so and allow them to be installed concurrently.
Therefore, you could install N .sos and dlopen() them, but you *can't*
write a program that compiles and links against N versions at the same
time using their packages (despite IBM's work to make that possible,
perhaps for use in their own databases).

> Packaging standards certainly matter, but they're not immutable laws
> of the universe. It seems reasonable to suppose that the people that
> define these standards would be willing to hear us out -- this is
> hardly a trifling matter, or something that only affects a small
> minority of *their* users.

OK, yeah, I'm thinking within the confines of things we can do easily
right now on existing systems as they are currently packaging software
only by changing our code, not "tell Debian to change their packaging
so we can compile and link against N versions".   Supposing Debian
maintainers (and all the others) agreed, there'd still something else
in favour of dlopen():  wouldn't it be nice if the users were not
limited by the versions that the packager of PostgreSQL decided to
link against?  What if someone has a good reason to want to use ICU
versions that are older than Debian currently ships, that are easily
available in add-on repos?

> > Yeah, I've flip-flopped a couple of times on the question of whether
> > ICU63 and ICU67 should be different collation providers, or
> > individual collations should somehow specify the library they want to
> > use (admittedly what I showed above with a raw library name is pretty
> > ugly and some indirection scheme might be nice).  It would be good to
> > drill into the pros and cons of those two choices.
>
> I think that there are pretty good technical reasons why each ICU
> version is tied to a particular version of CLDR. Implementing CLDR
> correctly and efficiently is a rather difficult process, even if we
> ignore figuring out what natural language rules make sense. And so
> linking to multiple different ICU versions doesn't really seem like
> overkill to me. Or if it is then I can easily think of far better
> examples of software bloat. Defining "stable behavior for collations"
> as "uses exactly the same software artifact over time" is defensive
> (compared to always linking to one ICU version that does it all), but
> we have plenty that we need to defend against here.

I think we're not understanding each other here: I was talking about
the technical choice of whether we'd model the multiple library
versions in our catalogues as different "collprovider" values, or
somehow encode them into the "collcollate" string, or something else.
I'm with you, I'm already sold on the mult-library concept (and have
been in several previous cycles of this recurring discussion), which
is why I'm trying to move to discussing nuts and bolts and packaging
and linking realities that apparently stopped any prototype from
appearing last time around.



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Tom Lane
Date:
Subject: Re: How about a psql backslash command to show GUCs?