Re: Update Unicode data to Unicode 16.0.0 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Update Unicode data to Unicode 16.0.0 |
Date | |
Msg-id | CA+TgmoYmT90FNueeedVJhm9MO_XH-HPgtyZwvQFzhTSHfmTSTQ@mail.gmail.com Whole thread Raw |
In response to | Re: Update Unicode data to Unicode 16.0.0 (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Update Unicode data to Unicode 16.0.0
|
List | pgsql-hackers |
On Wed, Mar 19, 2025 at 1:39 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2025-03-19 at 08:46 -0400, Robert Haas wrote: > > I see your point, but most people don't use the builtin collation > > provider. > > The other providers aren't affected by us updating Unicode, so I think > we got off track somehow. I suppose what I meant was: > > "If you are concerned about inconsistencies, and you move to the > builtin provider, then 99% of the inconsistency problem is gone. We can > remove the last 1% of the problem if we do all the work listed above." All right. I'm not sure I totally buy the 99% number, but I take your point. > > When an EDB customer asks "if I do X, > > will anything break," it's often the case that answering "maybe" is > > the same as answering "yes". > > That's a good point. However, note that "doesn't break primary keys" is > a nice guarantee, even if there's still some remaining doubts about > expression indexes, etc. No argument. > > They want a hard guarantee that the behavior will not > > change. > > My understanding of this thread so far was that we were mostly > concerned about internal inconsistencies of stored structures; e.g. > indexes that could return different results than a seqscan. I think that is true, but inconsistent indexes can be the worst problem without being the only one. > Not changing query results at all between major versions is a valid > concern, but a fairly strict one that doesn't seem limited to immutable > functions or collation issues. Surely, at least the results of "SELECT > version()" should change from release to release ;-) Maybe we should stop doing releases, and then users won't have to worry about our releases breaking things! Slightly more seriously, the use of UPPER() and LOWER() in expression indexes is not that uncommon. Sometimes, the index exists specifically to enforce a unique constraint. Yes, plain indexes on columns are more common, and it makes sense to target that case first, but we shouldn't be too quickly hand-wave away the use of case-folding functions as a thing that doesn't happen. > I certainly don't oppose giving users that choice. But I view it as a > burden we are placing on the users -- better than breakage, but not > really great, either. So if we do put in a ton of work, I'd like it if > we could arrive at a bettter destination. > > If we actually want the BEST user experience possible, they'd not even > really know that their index was ever inconsistent. Autovacuum would > come along and just find the few entries in the index that need fixing, > and reindex just those few tuples. In theory, it should be possible: > there are a finite number of codepoints that change each Unicode > version, and we can just search for them in the data and fix up derived > structures. I have to disagree with this. I think this is a case where fixing something automatically is clearly worse. First, it could never fix it instantly, so you would be stuck with some window where queries might return wrong results -- or if you prevent that by not using the indexes any more until they're fixed, then it would instead cause huge query performance regressions that could easily take down the whole system. Second, one of the things people like least about autovacuum is when it unexpectedly does a lot of work all at once. Today, that's usually a vacuum for wrap-around, but suddenly trying to fix all my indexes when I wasn't expecting that to happen could easily be just as bad. I strongly believe users want to control what happens, not have the system try to fix it for them automatically without their knowledge. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: