On Thu, Nov 11, 2021 at 2:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > ... but we want
> > collation definitions that *actually don't change*.
>
> Um ... how would that work? Unicode is a moving target. Even without
> their continual addition of stuff, I'm not convinced that social rules
> about how to sort are engraved on stone tablets. The need for collation
> updates may not be as predictable as the need for timezone updates,
> but I doubt that we can just freeze the data forever.
I don't know, but I think the social rules that actually matter change
extremely slowly. To my knowledge, the alphabet song has not changed
since I was in kindergarten. Now I agree that in some countries it
probably has ... but I doubt those events are super-common, because a
country does change its definition of alphabetical order, there's a
heck of a lot more updating to do than just reindexing your PostgreSQL
databases. The signs saying A-L go to the left and M-Z go to the right
will need revision if we decide M comes before L. I feel like it has
to be the case that most of the updates that are being made involve
things like how obscure characters compare to other obscure
characters, or what to do in corner-case situations involving multiple
diacritical marks. I know I've seen collation changes on Macs that
changed the order in which en_US.UTF8 strings sorted. But it wasn't
that the rules about English sorting have actually changed. It was
that somebody somewhere decided that the algorithm should be more or
less case-sensitive, or that we ought to ignore the amount of
whitespace between words instead of not ignoring it, or I don't know
exactly, but not anything that people universally agree on. Tinkering
with obscure rules that actual human beings wouldn't agree on and
prioritizing that over a stable algorithm is, IMHO, ridiculous.
If the Unicode consortium introduces a new emoji for "annoyed
PostgreSQL hacker," I really do not care whether that collates before
or after the existing symbol for "floral heart bullet, reversed
rotated." I care much more about whether it collates the same way
after the next minor release as it does the day it's released. And I
seriously doubt that I am alone in that.
--
Robert Haas
EDB: http://www.enterprisedb.com