Re: [18] Policy on IMMUTABLE functions and Unicode updates - Mailing list pgsql-hackers

From Laurenz Albe
Subject Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date
Msg-id 486d71991a3f80ec1c47e1bd7931e2ef3627b6b3.camel@cybertec.at
Whole thread Raw
In response to Re: [18] Policy on IMMUTABLE functions and Unicode updates  (Peter Eisentraut <peter@eisentraut.org>)
Responses Re: [18] Policy on IMMUTABLE functions and Unicode updates
Re: [18] Policy on IMMUTABLE functions and Unicode updates
List pgsql-hackers
On Mon, 2024-07-22 at 16:26 +0200, Peter Eisentraut wrote:
> I propose that, going forward, we take more care with Unicode updates:
> > assess the impact, provide time for comments, and consider possible
> > mitigations. In other words, it would be reviewed like any other
> > change.
>
> I disagree with that.  We should put ourselves into the position to
> adopt new Unicode versions without fear.  Similar to updates to time
> zones, snowball, etc.
>
> We can't be discussing the merits of the Unicode update every year.
> That would be madness.  How would we weigh each change against the
> others?  Some new character is introduced because it's the new currency
> of some country; seems important.  Some mobile phone platforms jumped
> the gun and already use the character for the same purpose before it was
> assigned; now the character is in databases but some function results
> will change with the upgrade.  How do we proceed?
>
> Moreover, if we were to decide to not take a particular Unicode update,
> that would then stop that process forever, because whatever the issue
> was wouldn't go away with the next Unicode version.

I understand the difficulty (madness) of discussing every Unicode
change.  If that's unworkable, my preference would be to stick with some
Unicode version and never modify it, ever.

The choice that users could make in that case is

a) use the built-in provider, don't get proper support for new code
   points, but never again worry about corrupted indexes after an
   upgrade

b) use ICU collations, be up to date with Unicode, but reindex whenever
   you upgrade to a new ICU version

> Unless I missed something here, all the problem examples involve
> unassigned code points that were later assigned.  (Assigned code points
> already have compatibility mechanisms, such as collation versions.)  So
> I would focus on that issue.  We already have a mechanism to disallow
> unassigned code points.  So there is a tradeoff that users can make:
> Disallow unassigned code points and avoid upgrade issues resulting from
> them.  Maybe that just needs to be documented more prominently.

Are you proposing a switch that would make PostgreSQL error out if
somebody wants to use an unassigned code point?  That would be an option.
If what you mean is just add some documentation that tells people not
to use unassigned code points if they want to avoid a reindex, I'd say
that is not enough.

Yours,
Laurenz Albe



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Next
From: Robert Haas
Date:
Subject: Re: Lock-free compaction. Why not?