Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKGL36vXMfcaDq+U1ZkoSsdfFnNx7GxhGM7aYzEbKs1W0=Q@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: Collation version tracking for macOS  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
Hi,

Here is a rebase of this experimental patch.  I think the basic
mechanics are promising, but we haven't agreed on a UX.  I hope we can
figure this out.

Restating the choice made in this branch of the experiment:  Here I
try to be just like DB2 (if I understood its manual correctly).
In DB2, you can use names like "en_US" if you don't care about
changes, and names like "CLDR181_en_US" if you do.  It's the user's
choice to use the second kind to avoid "unexpected effects on
applications or database objects" after upgrades.  Translated to
PostgreSQL concepts, you can use a database default ICU locale like
"en-US" if you don't care and "67:en-US" if you do, and for COLLATION
objects it's the same.  The convention I tried in this patch is that
you use either "en-US-x-icu" (which points to "en-US") or
"en-US-x-icu67" (which points to "67:en-US") depending on whether you
care about this problem.

I recognise that this is a bit cheesy, it's all the user's problem to
deal with or ignore.

An alternative mentioned by Peter E was that the locale names
shouldn't carry the prefix, but somehow we should have a list of ICU
versions to search for a matching datcollversion/collversion.  How
would that look?  Perhaps a GUC, icu_library_versions = '63, 67, 71'?
There is a currently natural and smallish range of supported versions,
probably something like 54 ... U_ICU_VERSION_MAJOR_NUM, but it seems a
bit weird to try to dlopen ~25 libraries or whatever it might be...
Do you think we should try to code this up?

I haven't tried it, but the main usability problem I predict with that
idea is this:  It can cope with a scenario where you created a
database with ICU 63 and started using a default of "en" and maybe
some explicit fr-x-icu or whatever, and then you upgrade to a new
postgres binary using ICU 71, and, as long as you still have ICU 63
installed it'll just magicaly keep using 63, now via dlopen().  But it
doesn't provide a way for me to create a new database that uses 63 on
purpose when I know what I'm doing.  There are various reasons I might
want to do that.

Maybe the ideas could be combined?  Perhaps "en" means "create using
binary's linked ICU, open using search-by-collversion", while "67:en"
explicitly says which to use?

Changes since last version:

 * Now it just uses the default dlopen() search path, unless you set
icu_library_path.  Is that a security problem?  It's pretty
convenient, because it means you can just "apt-get install libicu63"
(or local equivalent) and that's all, now 63 is available.

 * To try the idea out, I made it automatically create "*-x-icu67"
alongside the regular "-x-icu" collation objects at initdb time.

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: refactor ownercheck and aclcheck functions
Next
From: Nikita Malakhov
Date:
Subject: Re: Pluggable toaster