Re: pg_collation.collversion for C.UTF-8 - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: pg_collation.collversion for C.UTF-8
Date
Msg-id CA+hUKGLALgS3bFStFrv26mV9JahZzAbAVyk3+03QZVpJDrrFvg@mail.gmail.com
Whole thread Raw
In response to pg_collation.collversion for C.UTF-8  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: pg_collation.collversion for C.UTF-8  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Wed, Apr 19, 2023 at 12:36 AM Daniel Verite <daniel@manitou-mail.org> wrote:
> This seems to be based on the idea that C.* collations provide an
> immutable sort like "C", but it appears that it's not the case.

Hmm.  It seems I added that exemption initially for FreeBSD only in
ca051d8b101, and then merged the cases for several OSes in
beb4480c853.

It's extremely surprising to me that the sort order changed.  I
expected the sort order to be code point order:

https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

One interesting thing is that it seems that it might have been
independently invented by Debian (?) and then harmonised with glibc
2.35:

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1871363.html

Was the earlier Debian version buggy, or did it simply have a
different idea of what the sort order should be, intentionally?  Ugh.
From your examples, we can see that the older Debian system did not
have A < [some 4 digit code point], while the later version did (as
expected).  If so then it might be tempting to *not* do what you're
suggesting, since the stated goal of the thing is to be stable from
now on.  But it changed once in the early years of its existence!
Annoying.

Many OSes have a locale with this name.  I don't know this history,
who did it first etc, but now I am wondering if they all took the
"obvious" interpretation, that it should be code-point based,
extrapolating from "C" (really memcmp order):

https://unix.stackexchange.com/questions/597962/how-widespread-is-the-c-utf-8-locale



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Direct I/O
Next
From: Greg Stark
Date:
Subject: Re: Request for comment on setting binary format output per session