Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Update Unicode data to Unicode 16.0.0
Date
Msg-id 4edd59b81a56574fbd18ffa88b12f540fb6713fc.camel@j-davis.com
Whole thread Raw
In response to Re: Update Unicode data to Unicode 16.0.0  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> In fact, on the analogy of timezones, I think we should not only
> adopt newly-published Unicode versions pretty quickly but push
> them into released branches as well.

That approach suggests that we consider something like my previous
STRICT_UNICODE proposal[1]. If Postgres updates Unicode quickly enough,
there's not much reason that users would need to use unassigned code
points, so it would be practical to just reject them (as an option).
That would dramatically reduce the practical problems people would
encounter when we do update Unicode.

Note that assigned code points can still change behavior in later
versions, but not in ways that would typically cause a problem for
things like indexes. For instance, U+0363 changed from non-Alphabetic
to Alphabetic in Unicode 16, which changes the results of the
expression:

  U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8

from false to true, even though U+0363 is assigned in both Unicode
15.1.0 and 16.0.0. That might plausibly matter, but such cases would be
more obscure than case folding.

Regards,
    Jeff Davis

[1] https://commitfest.postgresql.org/patch/4876/




pgsql-hackers by date:

Previous
From: "Jelte Fennema-Nio"
Date:
Subject: Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup
Next
From: Corey Huinker
Date:
Subject: Re: Statistics Import and Export