Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Collation version tracking for macOS
Date
Msg-id CA+hUKGL5cYbrf3DXYNLBV78UXBiOaP-59MAzKFvC7dfT+49pTg@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Tue, Nov 29, 2022 at 3:55 PM Jeff Davis <pgsql@j-davis.com> wrote:
> =# select * from pg_icu_collation_versions('en_US') order by
> icu_version;
>  icu_version | uca_version | collator_version
> -------------+-------------+------------------
>  50.2        | 6.2         | 58.0.6.50
>  51.3        | 6.2         | 58.0.6.50
>  52.2        | 6.2         | 58.0.6.50
>  53.2        | 6.3         | 137.51
>  54.2        | 7.0         | 137.56
>  55.2        | 7.0         | 153.56
>  56.2        | 8.0         | 153.64
>  57.2        | 8.0         | 153.64
>  58.3        | 9.0         | 153.72
>  59.2        | 9.0         | 153.72
>  60.3        | 10.0        | 153.80
>  61.2        | 10.0        | 153.80
>  62.2        | 11.0        | 153.88
>  63.2        | 11.0        | 153.88
>  64.2        | 12.1        | 153.97
>  65.1        | 12.1        | 153.97
>  66.1        | 13.0        | 153.14
>  67.1        | 13.0        | 153.14
>  68.2        | 13.0        | 153.14
>  69.1        | 13.0        | 153.14
>  70.1        | 14.0        | 153.112
> (21 rows)
>
> This is good information, because it tells us that major library
> versions change more often than collation versions, empirically-
> speaking.

Wow, nice discovery about 104 -> 14.  Yeah, I imagine we'll want some
kind of band-aid to tolerate that exact screwup and avoid spurious
warnings.

Bugs aside, that's quite a revealing table in other ways.  We can see:

* The version scheme changed completely in ICU 53.  This corresponds
to a major rewrite of the collation code, I see[1].

* The first component seems to be (UCOL_RUNTIME_VERSION << 4) + 9.
UCOL_RUNTIME_VERSION is in their uvernum.h, currently 9, was 8, bumped
between 54 and 55 (I see this in their commit log), corresponding to
the two possible numbers 137 and 153 that we see there.  I don't know
where the final 9 term is coming from but it looks stable since the v2
collation rewrite landed.

* The second component seems to be uca_version_major * 8 +
uca_version_minor (that's the Unicode Collation Algorithm version, and
so far always matches the Unicode version, visible in the output of
the other function).

* The values you showed for English don't have a third component, but
if you try some other locales like 'zh' you'll see the CLDR major
version in third position.  So I guess some locales depend on CLDR
data and others don't.

TL;DR it *looks* like the set of ingredients for the version string is:

* UCOL_RUNTIME_VERSION (rarely changes)
* UCA/Unicode major.minor version
* sometimes CLDR major version, not sure when
* 9

[1] https://icu.unicode.org/design/collation/v2



pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: Support logical replication of DDLs
Next
From: John Naylor
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum