Re: Built-in CTYPE provider - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Built-in CTYPE provider |
Date | |
Msg-id | 20240709010545.8c.nmisch@google.com Whole thread Raw |
In response to | Re: Built-in CTYPE provider (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Built-in CTYPE provider
Re: Built-in CTYPE provider Re: Built-in CTYPE provider |
List | pgsql-hackers |
On Sat, Jul 06, 2024 at 04:19:21PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > As a released feature, NORMALIZE() has a different set of remedies to choose > > from, and I'm not proposing one. I may have sidetracked this thread by > > talking about remedies without an agreement that pg_c_utf8 has a problem. My > > question for the PostgreSQL maintainers is this: > > > textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being > > IMMUTABLE, will change behavior in some major releases. pg_upgrade does not > > have a concept of IMMUTABLE functions changing, so index scans will return > > wrong query results after upgrade. Is it okay for v17 to release a > > pg_c_utf8 planned to behave that way when upgrading v17 to v18+? > > I do not think it is realistic to define "IMMUTABLE" as meaning that > the function will never change behavior until the heat death of the > universe. As a counterexample, we've not worried about applying > bug fixes or algorithm improvements that change the behavior of > "immutable" numeric computations. True. There's a continuum from "releases can change any IMMUTABLE function" to "index integrity always wins, even if a function is as wrong as 1+1=3". I'm less concerned about the recent "Incorrect results from numeric round" thread, even though it's proposing to back-patch. I'm thinking about these aggravating factors for $SUBJECT: - $SUBJECT is planning an annual cadence of this kind of change. - We already have ICU providing collation support for the same functions. Unlike $SUBJECT, ICU integration gives packagers control over when to accept corruption at pg_upgrade time. - SQL Server, DB2 and Oracle do their Unicode updates in a non-corrupting way. (See Jeremy Schneider's reply concerning DB2 and Oracle.) - lower() and regexp are more popular in index expressions than high-digit-count numeric calculations. > I'd say a realistic policy is "immutable means we don't intend to > change it within a major release". If we do change the behavior, > either as a bug fix or a major-release improvement, that should > be release-noted so that people know they have to rebuild dependent > indexes and matviews. It sounds like you're very comfortable with $SUBJECT proceeding in its current form. Is that right?
pgsql-hackers by date: