Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Built-in CTYPE provider
Date
Msg-id 20240709010545.8c.nmisch@google.com
Whole thread Raw
In response to Re: Built-in CTYPE provider  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
List pgsql-hackers
On Sat, Jul 06, 2024 at 04:19:21PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > As a released feature, NORMALIZE() has a different set of remedies to choose
> > from, and I'm not proposing one.  I may have sidetracked this thread by
> > talking about remedies without an agreement that pg_c_utf8 has a problem.  My
> > question for the PostgreSQL maintainers is this:
> 
> >   textregexeq(... COLLATE pg_c_utf8, '[[:alpha:]]') and lower(), despite being
> >   IMMUTABLE, will change behavior in some major releases.  pg_upgrade does not
> >   have a concept of IMMUTABLE functions changing, so index scans will return
> >   wrong query results after upgrade.  Is it okay for v17 to release a
> >   pg_c_utf8 planned to behave that way when upgrading v17 to v18+?
> 
> I do not think it is realistic to define "IMMUTABLE" as meaning that
> the function will never change behavior until the heat death of the
> universe.  As a counterexample, we've not worried about applying
> bug fixes or algorithm improvements that change the behavior of
> "immutable" numeric computations.

True.  There's a continuum from "releases can change any IMMUTABLE function"
to "index integrity always wins, even if a function is as wrong as 1+1=3".
I'm less concerned about the recent "Incorrect results from numeric round"
thread, even though it's proposing to back-patch.  I'm thinking about these
aggravating factors for $SUBJECT:

- $SUBJECT is planning an annual cadence of this kind of change.

- We already have ICU providing collation support for the same functions.
  Unlike $SUBJECT, ICU integration gives packagers control over when to accept
  corruption at pg_upgrade time.

- SQL Server, DB2 and Oracle do their Unicode updates in a non-corrupting way.
  (See Jeremy Schneider's reply concerning DB2 and Oracle.)

- lower() and regexp are more popular in index expressions than
  high-digit-count numeric calculations.

> I'd say a realistic policy is "immutable means we don't intend to
> change it within a major release".  If we do change the behavior,
> either as a bug fix or a major-release improvement, that should
> be release-noted so that people know they have to rebuild dependent
> indexes and matviews.

It sounds like you're very comfortable with $SUBJECT proceeding in its current
form.  Is that right?



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: MIN/MAX functions for a record
Next
From: Tom Lane
Date:
Subject: Re: Built-in CTYPE provider