Re: [18] Policy on IMMUTABLE functions and Unicode updates - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date
Msg-id ed6cc199-cfb6-4feb-9439-4451a4ee0520@eisentraut.org
Whole thread Raw
In response to Re: [18] Policy on IMMUTABLE functions and Unicode updates  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [18] Policy on IMMUTABLE functions and Unicode updates
List pgsql-hackers
On 22.07.24 19:55, Robert Haas wrote:
> Every other piece of software in the world has to deal with changes as
> a result of the addition of new code points, and probably less
> commonly, revisions to existing code points. Presumably, their stuff
> breaks too, from time to time. I mean, I find it a bit difficult to
> believe that web browsers or messaging applications on phones only
> ever display emoji, and never try to do any sort of string sorting.

The sorting isn't the problem.  We have a versioning mechanism for 
collations.  What we do with the version information is clearly not 
perfect yet, but the mechanism exists and you can hack together queries 
that answer the question, did anything change here that would affect my 
indexes.  And you could build more tooling around that and so on.

The problem being considered here are updates to Unicode itself, as 
distinct from the collation tables.  A Unicode update can impact at 
least two things:

- Code points that were previously unassigned are now assigned.  That's 
obviously a very common thing with every Unicode update.  The new 
character will have new properties attached to it, so the result of 
various functions that use such properties (upper(), lower(), 
normalize(), etc.) could change, because previously the code point had 
no properties, and so those functions would not do anything interesting 
with the character.

- Certain properties of an existing character can change.  Like, a 
character used to be a letter and now it's a digit.  (This is an 
example; I'm not sure if that particular change would be allowed.)  In 
the extreme case, this could have the same impact as the above, but in 
practice the kinds of changes that are allowed wouldn't affect typical 
indexes.

I don't think this has anything in particular to do with the new builtin 
collation provider.  That is just one new consumer of this.



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Next
From: Tom Lane
Date:
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates