Re: Order changes in PG16 since ICU introduction - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Order changes in PG16 since ICU introduction
Date
Msg-id f83f089ee1e9acd5dbbbf3353294d24e1f196e95.camel@j-davis.com
Whole thread Raw
In response to Re: Order changes in PG16 since ICU introduction  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Order changes in PG16 since ICU introduction  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Order changes in PG16 since ICU introduction  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, 2023-04-21 at 14:23 -0400, Tom Lane wrote:
> postgres=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8'
> LOCALE = 'C';

...

>  test1     | postgres | UTF8     | icu             | C          |
> C          | en-US      |           |
> (4 rows)
>
> Looks like the "pick en-US even when told not to" problem exists here
> too.

Both provider (ICU) and the icu locale (en-US) are inherited from
template0. The LOCALE parameter to CREATE DATABASE doesn't affect
either of those things, because there's a separate parameter
ICU_LOCALE.

This happens the same way in v15, and although it matches the
documentation technically, it is not a great user experience.

I have a couple ideas:

1. Introduce a "none" provider to separate the concept of C/POSIX
locales from the libc provider. It's not really using a provider
anyway, it's just using memcmp(), and I think it causes confusion to
combine them. Saying "LOCALE_PROVIDER=none" is less error-prone than
"LOCALE_PROVIDER=libc LOCALE='C'".

2. Change the CREATE DATABASE syntax to catch these errors better at
the possible expense of backwards compatibility.

I am also having second thoughts about accepting "C" or "POSIX" as an
ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not
terribly useful (why not just use memcmp()?), it's not fast in my
measurements (en-US is faster), so maybe it's better to just throw an
error and tell the user to use C (or provider=none as I suggest
above)? 

Obviously the user could manually type "en-US-u-va-posix" if that's the
locale they want. Throwing an error would be a backwards-compatibility
issue, but in v15 an ICU locale of "C" just gives the root locale
anyway, which is probably not what they want.

Regards,
    Jeff Davis




pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Memory leak from ExecutorState context?
Next
From: David Zhang
Date:
Subject: Re: Add missing copyright for pg_upgrade/t/* files