Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Update Unicode data to Unicode 16.0.0
Date
Msg-id 145270.1742327420@sss.pgh.pa.us
Whole thread Raw
In response to Re: Update Unicode data to Unicode 16.0.0  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Update Unicode data to Unicode 16.0.0
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Mar 18, 2025 at 2:55 PM Jeff Davis <pgsql@j-davis.com> wrote:
>> Continuing on with Unicode 15.1 and accepting the unassigned code point
>> *cannot* prevent breakage.

> Under your definition, this is true, but I think Jeremy would define
> breakage differently. His primary concern, I expect, is *stability*.
> Breakage means that the same supposedly-stable results return
> different answers on the same data. Under that definition, continuing
> under Unicode 15.1 does prevent breakage.

That approach works only if you sit on Unicode 15.1 *forever*.
The impracticality of that seems obvious to me.  Sooner or later
you will need to update, and then you are going to suffer pain.
(In the running example of this thread, a unique index on LOWER(t)
might not only be corrupt, but might fail reindex due to the
constraint being violated under the newer rules.)  The longer you
wait, the more probable it is that you are going to have problems,
and the more painful it'll be to clean things up.

Now, if you both sit on Unicode 15.1 forever and disallow the
introduction of unassigned-per-15.1 code points, you can escape
that fate, but that approach brings its own kind of pain.

The short answer is that "immutable" = "doesn't change till the heat
death of the universe" is a definition that is not useful when
dealing with this type of data.  Other people determine the reality
that you have to deal with.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Simplify the logic a bit (src/bin/scripts/reindexdb.c)
Next
From: Andres Freund
Date:
Subject: Re: AIO v2.5