Re: Built-in CTYPE provider - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Built-in CTYPE provider |
Date | |
Msg-id | 20240704212641.c4.nmisch@google.com Whole thread Raw |
In response to | Re: Built-in CTYPE provider (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Built-in CTYPE provider
|
List | pgsql-hackers |
On Wed, Jul 03, 2024 at 02:19:07PM -0700, Jeff Davis wrote: > * Unless I made a mistake, the last three releases of Unicode (14.0, > 15.0, and 15.1) all have the exact same behavior for UPPER() and > LOWER() -- even for unassigned code points. It would be silly to > promise to stay with 15.1 and then realize that moving to 16.0 doesn't > create any actual problem. I think you're saying that if some Unicode update changes the results of a STABLE function but does not change the result of any IMMUTABLE function, we may as well import that update. Is that about right? If so, I agree. In addition to the options I listed earlier (error in pg_upgrade or document that IMMUTABLE stands) I would be okay with a third option. Decide here that we'll not adopt a Unicode update in a way that changes a v17 IMMUTABLE function result of the new provider. We don't need to write that in the documentation, since it's implicit in IMMUTABLE. Delete the "stable within a <productname>Postgres</productname> major version" documentation text. > * While someone can pin libc+ICU to particular versions, it's > impossible when using the official packages, and additionally requires > using something like [1], which just became available last year. I > don't think it's reasonable to put it forth as a matter-of-fact > solution. > > * Let's keep some perspective: we've lived for a long time with ALL > text indexes at serious risk of breakage. In contrast, the concerns you > are raising now are about certain kinds of expression indexes over data > containing certain unassigned code points. I am not dismissing that > concern, but the builtin provider moves us in the right direction and > let's not lose sight of that. I see you're trying to help users get less breakage, and that's a good goal. I agree $SUBJECT eliminates libc+ICU breakage, and libc+ICU breakage has hurt plenty. However, you proposed to update Unicode data and give REINDEX as the solution to breakage this causes. Unlike libc+ICU breakage, the packager has no escape from that. That's a different kind of breakage proposition, and no new PostgreSQL feature should do that. It's on a different axis from helping users avoid libc+ICU breakage, and a feature doesn't get to credit helping on one axis against a regression on the other axis. What am I missing here? > Given that no code changes for v17 are proposed, I suggest that we > refrain from making any declarations until the next version of Unicode > is released. If the pattern holds, that will be around September, which > still leaves time to make reasonable decisions for v18. Soon enough, a Unicode release will add one character to regexp [[:alpha:]]. PostgreSQL will then need to decide what IMMUTABLE is going to mean. How does that get easier in September? Thanks, nm > [1] https://github.com/awslabs/compat-collation-for-glibc
pgsql-hackers by date: