Home > mailing lists

Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Update Unicode data to Unicode 16.0.0
Date	March 18 02:15:21
Msg-id	4edd59b81a56574fbd18ffa88b12f540fb6713fc.camel@j-davis.com Whole thread Raw
In response to	Re: Update Unicode data to Unicode 16.0.0 (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Sat, 2025-03-15 at 12:15 -0400, Tom Lane wrote:
> In fact, on the analogy of timezones, I think we should not only
> adopt newly-published Unicode versions pretty quickly but push
> them into released branches as well.

That approach suggests that we consider something like my previous
STRICT_UNICODE proposal[1]. If Postgres updates Unicode quickly enough,
there's not much reason that users would need to use unassigned code
points, so it would be practical to just reject them (as an option).
That would dramatically reduce the practical problems people would
encounter when we do update Unicode.

Note that assigned code points can still change behavior in later
versions, but not in ways that would typically cause a problem for
things like indexes. For instance, U+0363 changed from non-Alphabetic
to Alphabetic in Unicode 16, which changes the results of the
expression:

  U&'\0363' ~ '[[:alpha:]]' COLLATE PG_C_UTF8

from false to true, even though U+0363 is assigned in both Unicode
15.1.0 and 16.0.0. That might plausibly matter, but such cases would be
more obscure than case folding.

Regards,
    Jeff Davis

[1] https://commitfest.postgresql.org/patch/4876/

pgsql-hackers by date:

From: "Jelte Fennema-Nio"
Date: 18 March, 02:08:03
Subject: Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup

From: Corey Huinker
Date: 18 March, 02:24:46
Subject: Re: Statistics Import and Export

Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers

Previous

Next