Re: [18] Policy on IMMUTABLE functions and Unicode updates - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [18] Policy on IMMUTABLE functions and Unicode updates
Date
Msg-id CA+TgmoZ+gaj6J=xAtiHeFA9v8pk79FbcOEiHKTMoJbzjZv4qwQ@mail.gmail.com
Whole thread Raw
In response to Re: [18] Policy on IMMUTABLE functions and Unicode updates  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: [18] Policy on IMMUTABLE functions and Unicode updates
List pgsql-hackers
On Tue, Jul 23, 2024 at 1:03 PM Jeff Davis <pgsql@j-davis.com> wrote:
> One of my strongest motivations for PG_C_UTF8 was that there was still
> a use case for libc in PG16: the "C.UTF-8" locale, which is not
> supported at all in ICU. Daniel Vérité made me aware of the importance
> of this locale, which offers code point order collation combined with
> Unicode ctype semantics.
>
> With PG17, between ICU and the builtin provider, there's little
> remaining reason to use libc (aside from legacy).

I was really interested to read Jeremy Schneider's slide deck, to
which he linked earlier, wherein he explained that other major
databases default to something more like C.UTF-8. Maybe we need to
relitigate the debate about what our default should be in light of
those findings (but, if so, on another thread with a clear subject
line). But even if we were to decide to change the default, there are
lots and lots of existing databases out there that are using libc
collations. I'm not in a good position to guess how many of those
people actually truly care about language-specific collations. I'm
positive it's not zero, but I can't really guess how much more than
zero it is. Even if it were zero, though, the fact that so many
upgrades are done using pg_upgrade means that this problem will still
be around in a decade even if we changed the default tomorrow.

(I do understand that you wrote "aside from legacy" so I'm not
accusing you of ignoring the upgrade issues, just taking the
opportunity to be more explicit about my own view.)

Also, Noah has pointed out that C.UTF-8 introduces some
forward-compatibility hazards of its own, at least with respect to
ctype semantics. I don't have a clear view of what ought to be done
about that, but if we just replace a dependency on an unstable set of
libc definitions with a dependency on an equally unstable set of
PostgreSQL definitions, we're not really winning. Do we need to
version the new ctype provider?

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Marcos Pegoraro
Date:
Subject: Useless toast
Next
From: Tom Lane
Date:
Subject: Re: [18] Policy on IMMUTABLE functions and Unicode updates