Re: Order changes in PG16 since ICU introduction - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Order changes in PG16 since ICU introduction
Date
Msg-id c840107b-4cb9-c8e9-abb7-1d8c5e0d51df@enterprisedb.com
Whole thread Raw
In response to Re: Order changes in PG16 since ICU introduction  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Order changes in PG16 since ICU introduction
List pgsql-hackers
On 11.05.23 23:29, Jeff Davis wrote:
> New patch series attached.
> 
> === 0001: fix bug that allows creating hidden collations
> 
> Bug:
> https://www.postgresql.org/message-id/051c9395cf880307865ee8b17acdbf7f838c1e39.camel@j-davis.com

This is still being debated in the other thread.  Not really related to 
this thread, so I suggest dropping it from this patch series.


> === 0002: handle some kinds of libc-stlye locale strings
> 
> ICU used to handle libc locale strings like 'fr_FR@euro', but doesn't
> in later versions. Handle them in postgres for consistency.

I tend to agree with ICU that these variants are obsolete, and we don't 
need to support them anymore.  If this were a tiny patch, then maybe ok, 
but the way it's presented here the whole code is duplicated between 
pg_locale.c and initdb.c, which is not great.


> === 0003: reduce icu_validation_level to WARNING
> 
> Given that we've seen some inconsistency in which locale names are
> accepted in different ICU versions, it seems best not to be too strict.
> Peter Eisentraut suggested that it be set to ERROR originally, but a
> WARNING should be sufficient to see problems without introducing risks
> migrating to version 16.

I'm not sure why this is the conclusion.  Presumably, the detection 
capabilities of ICU improve over time, so we want to take advantage of 
that?  What are some example scenarios where this change would help?


> === 0004-0006:
> 
> To solve the issues that have come up in this thread, we need CREATE
> DATABASE (and createdb and initdb) to use LOCALE to mean the collation
> locale regardless of which provider is in use (which is what 0006
> does).
> 
> 0006 depends on ICU handling libc locale names. It already does a good
> job for most libc locale names (though patch 0002 fixes a few cases
> where it doesn't). There may be more cases, but for the most part libc
> names are interpreted in a reasonable way. But one important case is
> missing: ICU does not handle the "C" locale as we expect (that is,
> using memcmp()).
> 
> We've already allowed users to create ICU collations with the C locale
> in the past, which uses the root collation (not memcmp()), and we need
> to keep supporting that for upgraded clusters.

I'm not sure I agree that we need to keep supporting that.  The only way 
you could get that in past releases is if you specify explicitly, "give 
me provider ICU and locale C", and then it wouldn't actually even work 
correctly.  So nobody should be using that in practice, and nobody 
should have stumbled into that combination of settings by accident.


>    3. Introduce collation provider "none", which is always memcmp-based
> (patch 0004). It's equivalent to the libc locale=C, but it allows
> specifying the LC_COLLATE and LC_CTYPE independently. A command like
> CREATE DATABASE ... LOCALE_PROVIDER='icu' ICU_LOCALE='C'
> LC_COLLATE='en_US' would get changed (with a NOTICE) to provider "none"
> (patch 0005), so you'd have datlocprovider=none, datcollate=en_US. For
> the database default collation, that would always use memcmp(), but the
> server environment LC_COLLATE would be set to en_US as the user
> specified.

This seems most attractive, but I think it's quite invasive at this 
point, especially given the dubious premise (see above).


> === 0007: Add a GUC to control the default collation provider
> 
> Having a GUC would make it easier to migrate to ICU without surprises.
> This only affects the default for CREATE COLLATION, not CREATE DATABASE
> (and obviously not initdb).

It's not clear to me why we would want that.  Also not clear why it 
should only affect CREATE COLLATION.




pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Missing update of all_hasnulls in BRIN opclasses
Next
From: Alvaro Herrera
Date:
Subject: Re: Using make_ctags leaves tags files in git