Re: Order changes in PG16 since ICU introduction - Mailing list pgsql-hackers

From Daniel Verite
Subject Re: Order changes in PG16 since ICU introduction
Date
Msg-id cb448574-aa7c-4969-b2dd-c9eb221d7e06@manitou-mail.org
Whole thread Raw
In response to Re: Order changes in PG16 since ICU introduction  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Order changes in PG16 since ICU introduction
List pgsql-hackers
    Jeff Davis wrote:

> If we special case locale=C, but do nothing for locale=fr_FR, then I'm
> not sure we've solved the problem. Andrew Gierth raised the issue here,
> which he called "maximally confusing":
>
> https://postgr.es/m/874jp9f5jo.fsf@news-spur.riddles.org.uk
>
> That's why I feel that we need to make locale apply to whatever the
> provider is, not just when it happens to be C.

While I agree that the LOCALE option in CREATE DATABASE is
counter-intuitive, I find it questionable that blending ICU
and libc locales into it helps that much with the user experience.

Trying the lastest v6-* patches applied on top of 722541ead1
(before the pgindent run), here are a few examples when I
don't think it goes well.

The OS is Ubuntu 22.04 (glibc 2.35, ICU 70.1)

initdb:

  Using default ICU locale "fr".
  Using language tag "fr" for ICU locale "fr".
  The database cluster will be initialized with this locale configuration:
    provider:     icu
    ICU locale:  fr
    LC_COLLATE:  fr_FR.UTF-8
    LC_CTYPE:     fr_FR.UTF-8
    LC_MESSAGES: fr_FR.UTF-8
    LC_MONETARY: fr_FR.UTF-8
    LC_NUMERIC:  fr_FR.UTF-8
    LC_TIME:     fr_FR.UTF-8
  The default database encoding has accordingly been set to "UTF8".


#1

postgres=# create database test1 locale='fr_FR.UTF-8';
NOTICE:  using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
ERROR:    new ICU locale (fr-FR) is incompatible with the ICU locale of the
template database (fr)
HINT:  Use the same ICU locale as in the template database, or use template0
as template.


That looks like a fairly generic case that doesn't work seamlessly.


#2

postgres=# create database test2 locale='C.UTF-8' template='template0';
NOTICE:  using standard form "en-US-u-va-posix" for ICU locale "C.UTF-8"
CREATE DATABASE


en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
this interpretation is arguably not what a user would expect.

I would expect the ICU warning or error (icu_validation_level) to kick
in instead of that transliteration.


#3

$ grep french /etc/locale.alias
french        fr_FR.ISO-8859-1

postgres=# create database test3 locale='french' template='template0'
encoding='LATIN1';
WARNING:  ICU locale "french" has unknown language "french"
HINT:  To disable ICU locale validation, set parameter icu_validation_level
to DISABLED.
CREATE DATABASE


In practice we're probably getting the "und" ICU locale whereas "fr" would
be appropriate.


I assume that we would find more cases like that if testing on many
operating systems.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



pgsql-hackers by date:

Previous
From: MARK CALLAGHAN
Date:
Subject: Re: benchmark results comparing versions 15.2 and 16
Next
From: Robert Haas
Date:
Subject: Re: PG 16 draft release notes ready