Re: Built-in CTYPE provider - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Built-in CTYPE provider
Date
Msg-id CA+TgmoYCNEm4tG0xgszXpyaM_-fu3onmHf2TDM1j5ru8ZmiCcQ@mail.gmail.com
Whole thread Raw
In response to Re: Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
List pgsql-hackers
On Wed, Dec 20, 2023 at 2:13 PM Jeff Davis <pgsql@j-davis.com> wrote:
> On Wed, 2023-12-20 at 13:49 +0100, Daniel Verite wrote:
> > If the Postgres default was bytewise sorting+locale-agnostic
> > ctype functions directly derived from Unicode data files,
> > as opposed to libc/$LANG at initdb time, the main
> > annoyance would be that "ORDER BY textcol" would no
> > longer be the human-favored sort.
> > For the presentation layer, we would have to write for instance
> >  ORDER BY textcol COLLATE "unicode" for the root collation
> > or a specific region-country if needed.
> > But all the rest seems better, especially cross-OS compatibity,
> > truly immutable and faster indexes for fields that
> > don't require linguistic ordering, alignment between Unicode
> > updates and Postgres updates.
>
> Thank you, that summarizes exactly the compromise that I'm trying to
> reach.

This makes sense to me, too, but it feels like it might work out
better for speakers of English than for speakers of other languages.
Right now, I tend to get databases that default to en_US.utf8, and if
the default changed to C.utf8, then the case-comparison behavior might
be different but the letters would still sort in the right order. For
someone who is currently defaulting to es_ES.utf8 or fr_FR.utf8, a
change to C.utf8 would be a much bigger problem, I would think. Their
alphabet isn't in code point order, and so things would be
alphabetized wrongly. That might be OK if they don't care about
ordering for any purpose other than equality lookups, but otherwise
it's going to force them to change the default, where today they don't
have to do that.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Built-in CTYPE provider
Next
From: Jacob Burroughs
Date:
Subject: Re: libpq compression (part 3)