Re: Create collation reporting the ICU locale display name - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Create collation reporting the ICU locale display name
Date
Msg-id CAH2-Wzmo3jt6h0BEBYxDfxMJ+pcg7eCJxR3PNpg0XMsBap+iaQ@mail.gmail.com
Whole thread Raw
In response to Re: Create collation reporting the ICU locale display name  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Sep 14, 2019 at 8:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The advantage of describe_collation(oid) is that we would not be
> building knowledge into the callers about which columns of pg_collation
> matter for this purpose.  I'm not even convinced that the two you posit
> here are sufficient --- the encoding seems relevant, for instance.

+1. It seems like a good idea to consider the ICU display name to be
just that -- a display name. It should be considered a dynamic thing.
For one thing, it is subject to localization, so it isn't fixed even
when nothing changes internally. But there is also the question of
external changes. Internationalization is inherently a squishy
business.

I believe that the main goal of BCP 47 (i.e. ICU's CREATE COLLATION
locale strings) is to fail gracefully when cultural or political
developments occur that change the expectations of users. BCP 47 is
actually an IETF standard -- it's not from the Unicode consortium, or
from ICU. It is supposed to be highly forgiving -- this is a feature,
not a bug. Of course, many facets of a locale control things that we
don't care about, or at least don't involve ICU with. For example,
locale controls the default currency symbol.

There are pg_upgrade scenarios in which the display string for a
collation will legitimately change due to external changes. For
example, somebody that lived in Serbia and Montenegro (a country which
ceased to exist in 2006) could have used a locale string with "cs" (an
ISO 3166-1 code), which has been deprecated [1]. If memory serves,
there is a 5 year grace period codified by some ISO standard or other,
so recent ICU versions know nothing about Serbia and Montenegro
specifically. But they'll still recognize the Serbian language code,
as well as language codes for minority languages spoken in Serbia and
Montenegro. So, for the most part, the impact of sticking with this
old/somewhat inaccurate locale definition string is minimal.
(Actually, maybe downgrade scenarios are more interesting in
practice.)

[1] https://en.wikipedia.org/wiki/ISO_3166-2:CS#Codes_deleted_in_Newsletter_I-8
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: "Thomas Rosenstein"
Date:
Subject: Re: Standby Replication and Replication Delay
Next
From: Tomas Vondra
Date:
Subject: Re: Extending range type operators to cope with elements