Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Collation version tracking for macOS
Date
Msg-id CAH2-WzkEhSsk8iy0ReQ+3ircHMPfaFtXq56bPyZWM0tQiW67HQ@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Jeremy Schneider <schneider@ardentperf.com>)
List pgsql-hackers
On Tue, Jun 7, 2022 at 2:13 PM Jeremy Schneider
<schneider@ardentperf.com> wrote:
> For my for my part, gut feeling is that MacOS major releases will be
> similar to any other OS major release, which may contain updates to
> collation algorithms and locales. ISTM like the same thing PG is looking
> for on other OS's to trigger the warning. But it might be good to get an
> official reference on MacOS, if someone knows where to find one?  (I don't.)

I just don't think that we should be relying on a huge entity like
Apple or even glibc for this -- they don't share our priorities, and
there is no reason for this to change. The advantage of ICU versioning
is that it is just one library, that can coexist with others,
including other versions of ICU.

Imagine a world in which we support multiple ICU versions (for Debian
packages, say), some of which are getting quite old. Maybe we can
lobby for the platform to continue to support that old version of the
library -- there ought to be options. Lobbying Debian to stick with an
older version of glibc is another matter entirely. That has precisely
zero chance of ever succeeding, for reasons that are quite
understandable.

Half the problem here is to detect breaking changes, but the other
half is to not break anything in the first place. Or to give the user
plenty of opportunity to transition incrementally, without needing to
reindex everything at the same time. Obviously the only way that's
possible is by supporting multiple versions of ICU at the same time,
in the same database. This requires indirection that distinguishes
between "physical and logical" collation versions, where the same
nominal collation can have different implementations across multiple
ICU versions.

The rules for standards like BCP47 (the system that defines the name
of an ICU/CLDR locale) are deliberately very tolerant of what they
accept in order to ensure forwards and backwards compatibility in
environments where there isn't just one ICU/CLDR version [1] (most
environments in the world of distributed or web applications). So you
can expect the BCP47 name of a collation to more or less work on any
ICU version, perhaps with some loss of functionality (this is
unavoidable when you downgrade ICU to a version that doesn't have
whatever CLDR customization you might have relied on). It's very
intentionally a "best effort" approach, because throwing a "locale not
found" error message usually isn't helpful from the point of view of
the end user. Note that this is a broader standard than ICU or CLDR or
even Unicode.

[1] https://www.ietf.org/rfc/rfc6067.txt
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Thomas Munro
Date:
Subject: Re: Collation version tracking for macOS