Home > mailing lists

Re: pg_collation.collversion for C.UTF-8 - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: pg_collation.collversion for C.UTF-8
Date	April 18, 2023 22:48:05
Msg-id	CA+hUKGLALgS3bFStFrv26mV9JahZzAbAVyk3+03QZVpJDrrFvg@mail.gmail.com Whole thread Raw
In response to	pg_collation.collversion for C.UTF-8 ("Daniel Verite" <daniel@manitou-mail.org>)
Responses	Re: pg_collation.collversion for C.UTF-8
List	pgsql-hackers

Tree view

On Wed, Apr 19, 2023 at 12:36 AM Daniel Verite <daniel@manitou-mail.org> wrote:
> This seems to be based on the idea that C.* collations provide an
> immutable sort like "C", but it appears that it's not the case.

Hmm.  It seems I added that exemption initially for FreeBSD only in
ca051d8b101, and then merged the cases for several OSes in
beb4480c853.

It's extremely surprising to me that the sort order changed.  I
expected the sort order to be code point order:

https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

One interesting thing is that it seems that it might have been
independently invented by Debian (?) and then harmonised with glibc
2.35:

https://www.mail-archive.com/debian-bugs-dist@lists.debian.org/msg1871363.html

Was the earlier Debian version buggy, or did it simply have a
different idea of what the sort order should be, intentionally?  Ugh.
From your examples, we can see that the older Debian system did not
have A < [some 4 digit code point], while the later version did (as
expected).  If so then it might be tempting to *not* do what you're
suggesting, since the stated goal of the thing is to be stable from
now on.  But it changed once in the early years of its existence!
Annoying.

Many OSes have a locale with this name.  I don't know this history,
who did it first etc, but now I am wondering if they all took the
"obvious" interpretation, that it should be code-point based,
extrapolating from "C" (really memcmp order):

https://unix.stackexchange.com/questions/597962/how-widespread-is-the-c-utf-8-locale

pgsql-hackers by date:

From: Greg Stark
Date: 18 April 2023, 22:35:09
Subject: Re: Direct I/O

From: Greg Stark
Date: 18 April 2023, 22:53:46
Subject: Re: Request for comment on setting binary format output per session

Re: pg_collation.collversion for C.UTF-8 - Mailing list pgsql-hackers

Previous

Next