Re: Collation version tracking for macOS - Mailing list pgsql-hackers

From Rod Taylor
Subject Re: Collation version tracking for macOS
Date
Msg-id CAHz80e76vuRGm0D2sDO52Wyua57WLNM2ug44d1Lk4Y5-PUHmKA@mail.gmail.com
Whole thread Raw
In response to Re: Collation version tracking for macOS  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Collation version tracking for macOS
List pgsql-hackers


On Mon, Jun 6, 2022 at 8:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jim Nasby <nasbyj@amazon.com> writes:
>> I think the real problem here is that the underlying software mostly
>> doesn't take this issue seriously.

> The first step to a solution is admitting that the problem exists.
> Ignoring broken backups, segfaults and data corruption as a "rant"
> implies that we simply throw in the towel and tell users to suck it up
> or switch engines. There are other ways to address this short of the
> community doing all the work itself. One simple example would be to
> refuse to start if the collation provider has changed since initdb
> (which we'd need to allow users to override).

You're conveniently skipping over the hard part, which is to tell
whether the collation provider has changed behavior (which we'd better
do with pretty darn high accuracy, if we're going to refuse to start
on the basis of thinking it has).  Unfortunately, giving a reliable
indication of collation behavioral changes is *exactly* the thing
that the providers aren't taking seriously.

Is this more involved than creating a list of all valid Unicode characters (~144 thousand), sorting them, then running crc32 over the sorted order to create the "version" for the library/collation pair? Far from free but few databases use more than a couple different collations.

--
Rod Taylor

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: How about a psql backslash command to show GUCs?
Next
From: Julien Rouhaud
Date:
Subject: Re: Add header support to text format and matching feature