Re: Collation version tracking for macOS - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Collation version tracking for macOS |
Date | |
Msg-id | CA+hUKGL4VZRpP3CkjYQkv4RQ6pRYkPkSNgKSxFBwciECQ0mEuQ@mail.gmail.com Whole thread Raw |
In response to | Re: Collation version tracking for macOS (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Collation version tracking for macOS
|
List | pgsql-hackers |
On Wed, Jun 8, 2022 at 8:16 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Jun 6, 2022 at 5:45 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Earlier I mentioned distinct "providers" but I take that back, that's > > too complicated. Reprising an old idea that comes up each time we > > talk about this, this time with some more straw-man detail: what about > > teaching our ICU support to understand "libicu18n.so.71:en" to mean > > that it should dlopen() that library and use its functions? Or some > > cleverer, shorter notation. Then it's the user's problem to make sure > > the right libraries are installed, and it'll fail if they're not. For > > example, on Debian bookworm right now you can install libicu63, > > libicu67, libicu71, though only the "current" -dev package, but which > > I'm sure we can cope with. You're at the mercy of the distro or > > add-on package repos to keep a lot of versions around, but that seems > > OK. > > Right. Postgres could link to multiple versions of ICU at the same > time. Right now it doesn't, and right now the ICU C symbol names that > we use are actually versioned (this isn't immediately apparent because > the C preprocessor makes it appear that ICU symbol names are generic). Yeah, it's possible to link against multiple versions in theory and that might be a way to do it if we were shipping our own N copies of ICU like DB2 does, but that's hard in practice for shared libraries on common distros (and vendoring or static linking of such libraries was said to be against many distros' rules, since it would be a nightmare if everyone did that, though I don't have a citation for that). I suspect it's better to use dlopen() to load them, because (1) I believe that the major distros only have -dev/-devel packages for the "current" version, even though they let you install the packages containing the .so files for multiple versions at the same time so that binaries linked against older versions keep working and (2) I think it'd be cool if users were free to find more ICU versions in add-on package repos and be able to use them to get a version that the packager of PostgreSQL didn't anticipate. > We could perhaps invent a new indirection that knows about > multiple ICU versions, each of which is an independent collation > provider, or maybe a related collation provider that gets used by > default on REINDEX. ICU is designed for this kind of thing. That > approach more or less puts packagers on the hook for managing > collation stability. But now long term collation stability is at least > feasible -- we at least have a coherent strategy. In the worst case > the community .deb and .rpm repos might continue to support an older > ICU version, or lobby for its continued support by the distro (while > actively discouraging its use in new databases). This isn't the same > thing as forking ICU. It's a compromise between that extreme, and > the current situation. Yeah, I've flip-flopped a couple of times on the question of whether ICU63 and ICU67 should be different collation providers, or individual collations should somehow specify the library they want to use (admittedly what I showed above with a raw library name is pretty ugly and some indirection scheme might be nice). It would be good to drill into the pros and cons of those two choices. As for getting sane defaults, I don't know if this is a good idea, but it's an idea: perhaps schemas and search paths could be used, you avoid having to include ugly version strings in the collation identifiers, and the search path effectively controls default when you don't want to be explicit (= most users)?
pgsql-hackers by date: