Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Update Unicode data to Unicode 16.0.0 |
Date | |
Msg-id | CA+Tgmoa7m6umcjnode1YO09gEHno1D_-V-+3VWmKyjLnXV7JDQ@mail.gmail.com Whole thread Raw |
In response to | Re: Update Unicode data to Unicode 16.0.0 (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Update Unicode data to Unicode 16.0.0
|
List | pgsql-hackers |
On Tue, Mar 18, 2025 at 10:33 PM Jeff Davis <pgsql@j-davis.com> wrote: > If we compare the following two problems: > > A. With glibc or ICU, every text index, including primary keys, are > highly vulnerable to inconsistencies after an OS upgrade, even if > there's no Postgres upgrade; vs. > > B. With the builtin provider, only expression indexes and a few other > things are vulnerable, only during a major version upgrade, and mostly > (but not entirely) when using recently-assigned Cased letters. > > To me, problem A seems about 100 times worse than B almost any way I > can imagine measuring it: number of objects vulnerable, severity of the > problem when it does happen, likelihood of a vulnerable object having > an actual problem, etc. If you disagree, I'd like to hear more. I see your point, but most people don't use the builtin collation provider. Granted, we could change the default and then more people would use it, but I'm not sure people would be happy with the resulting behavior: a lot of people probably want "a" to sort near "á" even if they don't have strong preferences about the exact details in every corner case. Also, and I think rather importantly, many people are less sensitive to whether anything is actually broken than to whether anything hypothetically could be broken. When an EDB customer asks "if I do X, will anything break," it's often the case that answering "maybe" is the same as answering "yes". The DBA doesn't necessarily know or care what the application does or know or care what data is in the database. They want a hard guarantee that the behavior will not change. From that point of view, your statement that nothing will change in minor releases when the builtin provider is used is quite powerful (and a good argument against back-patching Unicode updates as Tom proposes). But people will still need to use other collation providers and they will still need to do major release upgrades and they also want those things to be guaranteed not to break. Again, I'm not trying to oblige you to deliver that behavior and I confess to ignorance on how we could realistically get there. But I do think it's what people want: to be forced to endure collation updates infrequently, and to be able to choose the timing of the update when they absolutely must happen, and to be able to easily know exactly what they need to reindex. And from that point of view -- and again, I'm not volunteering to implement it and I'm not telling you to do it either -- Joe's proposal of supporting multiple versions sounds fantastic. Because then, I can do a major version upgrade using pg_upgrade and keep everything pinned to the old Unicode version or, perhaps even the old ICU version if we had multi-version libicu support. I may be able to go through several major version upgrades without ever needing to survive a collation change. Eventually my hand will be forced, because PostgreSQL will remove support for the Unicode version I care about or that old version of libicu won't compile any more or will have security vulnerabilities or something, but I will have the option to deal with that collation change before or after any PostgreSQL version changes that I'm doing. I'll be able to change the collation version at a time when I'm not changing anything else and deal with JUST that fallout on its own. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: