Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Update Unicode data to Unicode 16.0.0 |
Date | |
Msg-id | CA+TgmoZ7riCiacKzQmq=82Fu7B74A9MAAKqUmuv8BEeWHZMhTA@mail.gmail.com Whole thread Raw |
In response to | Re: Update Unicode data to Unicode 16.0.0 (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Update Unicode data to Unicode 16.0.0
|
List | pgsql-hackers |
On Wed, Mar 19, 2025 at 5:47 PM Jeff Davis <pgsql@j-davis.com> wrote: > Do you have a sketch of what the ideal Unicode version management > experience might look like? Very high level, like "this is what happens > by default during an upgrade" and "this is how a user discovers that > that they might want to update Uniocde", etc. > > What ways can/should we nudge users to update more quickly, if at all, > so that they are less likely to have problems with newly-assigned code > points? > > And, if possible, how we might extend this user experience to libc or > ICU updates? As I think you know, I don't consider myself an expert in this area, just somebody who has seen a decent amount of user pain (although I am sure that even there some other people have seen more). That said, for me the ideal would probably include the following things: * When the collation/ctype/whatever definitions upon which you are relying change, you can either decide to switch to the new ones without rebuilding your indexes and risk wrong results until you reindex, or you can decide to create new indexes using the new definitions and drop the old ones. * You're never forced to adopt new definitions during a SPECIFIC major or minor release upgrade or when making some other big change to the system. It's fine, IMHO, if we eventually remove support for old stuff, but there should be a multi-year window of overlap. For example, if PostgreSQL 42 adds support for Unicode 95.0.0, we'd keep that support for, I don't know, at least the next four or five major versions. So upgrading PG can eventually force you to upgrade collation defs, but you don't get into a situation where PG 41 supports only Unicode < 95 and PG 42 supports only Unicode >= 95. * In an absolutely perfect world, we'd have strong versioning of every type of collation from every provider. This is probably very difficult to achieve in practice, so maybe the somewhat more realistic goal might be to get to a point where most users, most of the time, are relying on collations with strong versioning. For glibc, this seems relatively hopeless unless upstream changes their policy in a big way. For ICU, loading multiple library versions seems like a possible path forward. Relying more on built-in collations seems like another possible approach, but I think that would require us to have more than just a code-point sort: we'd need to have built-in collations for users of various languages. That sounds like it would be a lot of work to develop, but even worse, it sounds like it would be a tremendous amount of work to maintain. I expect Tom will opine that this is an absolutely terrible idea that we should never do under any circumstances, and I understand the sentiment, but I think it might be worth considering if we're confident we will have people to do the maintenance over the long term. * I would imagine pg_upgrade either keeping the behavior unchanged for any strongly-versioned collation, or failing. I don't see a strong need to try to notify users about the availability of new versions otherwise. People who want to stay current will probably figure out how to do that, and people who don't will ignore any warnings we give them. I'm not completely opposed to some other form of notification, but I think it's OK if "we finally removed support for your extremely old ICU version" is the driving force that makes people upgrade. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: