Hi,
Here is a rebase of this experimental patch. I think the basic
mechanics are promising, but we haven't agreed on a UX. I hope we can
figure this out.
Restating the choice made in this branch of the experiment: Here I
try to be just like DB2 (if I understood its manual correctly).
In DB2, you can use names like "en_US" if you don't care about
changes, and names like "CLDR181_en_US" if you do. It's the user's
choice to use the second kind to avoid "unexpected effects on
applications or database objects" after upgrades. Translated to
PostgreSQL concepts, you can use a database default ICU locale like
"en-US" if you don't care and "67:en-US" if you do, and for COLLATION
objects it's the same. The convention I tried in this patch is that
you use either "en-US-x-icu" (which points to "en-US") or
"en-US-x-icu67" (which points to "67:en-US") depending on whether you
care about this problem.
I recognise that this is a bit cheesy, it's all the user's problem to
deal with or ignore.
An alternative mentioned by Peter E was that the locale names
shouldn't carry the prefix, but somehow we should have a list of ICU
versions to search for a matching datcollversion/collversion. How
would that look? Perhaps a GUC, icu_library_versions = '63, 67, 71'?
There is a currently natural and smallish range of supported versions,
probably something like 54 ... U_ICU_VERSION_MAJOR_NUM, but it seems a
bit weird to try to dlopen ~25 libraries or whatever it might be...
Do you think we should try to code this up?
I haven't tried it, but the main usability problem I predict with that
idea is this: It can cope with a scenario where you created a
database with ICU 63 and started using a default of "en" and maybe
some explicit fr-x-icu or whatever, and then you upgrade to a new
postgres binary using ICU 71, and, as long as you still have ICU 63
installed it'll just magicaly keep using 63, now via dlopen(). But it
doesn't provide a way for me to create a new database that uses 63 on
purpose when I know what I'm doing. There are various reasons I might
want to do that.
Maybe the ideas could be combined? Perhaps "en" means "create using
binary's linked ICU, open using search-by-collversion", while "67:en"
explicitly says which to use?
Changes since last version:
* Now it just uses the default dlopen() search path, unless you set
icu_library_path. Is that a security problem? It's pretty
convenient, because it means you can just "apt-get install libicu63"
(or local equivalent) and that's all, now 63 is available.
* To try the idea out, I made it automatically create "*-x-icu67"
alongside the regular "-x-icu" collation objects at initdb time.