Re: [18] Policy on IMMUTABLE functions and Unicode updates - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date
Msg-id e753e0e3-dc99-44f6-8ad7-100597cc6e7e@eisentraut.org
Whole thread Raw
In response to Re: [18] Policy on IMMUTABLE functions and Unicode updates  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 24.07.24 14:20, Robert Haas wrote:
> On Wed, Jul 24, 2024 at 12:42 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>> Fair enough.  My argument was, that topic is distinct from the topic of
>> this thread.
> 
> OK, that's fair. But I think the solutions are the same: we complain
> all the time about glibc and ICU shipping collations and not
> versioning them. We shouldn't make the same kinds of mistakes. Even if
> ctype is less likely to break things than collations, it still can,
> and we should move in the direction of letting people keep the v17
> behavior for the foreseeable future while at the same time having a
> way that they can also get the new behavior if they want it (and the
> new behavior should be the default).

Versioning is possibly part of the answer, but I think it would be 
different versioning from the collation version.

The collation versions are in principle designed to change rarely.  Some 
languages' rules might change once in twenty years, some never.  Maybe 
you have a database mostly in English and a few tables in, I don't know, 
Swedish (unverified examples).  Most of the time nothing happens during 
upgrades, but one time in many years you need to reindex the Swedish 
tables, and the system starts warning you about that as soon as you 
access the Swedish tables.  (Conversely, if you never actually access 
the Swedish tables, then you don't get warned about.)

If we wanted a similar versioning system for the Unicode updates, it 
would be separate.  We'd write the Unicode version that was current when 
the system catalogs were initialized into, say, a pg_database column. 
And then at run-time, when someone runs say the normalize() function or 
some regular expression character classification, then we check what the 
version of the current compiled-in Unicode tables are, and then we'd 
issue a warning when they are different.

A possible problem is that the Unicode version changes in practice with 
every major PostgreSQL release, so this approach would end up warning 
users after every upgrade.  To avoid that, we'd probably need to keep 
support for multiple Unicode versions around, as has been suggested in 
this thread already.




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: warning: dereferencing type-punned pointer
Next
From: Peter Eisentraut
Date:
Subject: Re: warning: dereferencing type-punned pointer