Re: Order changes in PG16 since ICU introduction - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: Order changes in PG16 since ICU introduction |
Date | |
Msg-id | c840107b-4cb9-c8e9-abb7-1d8c5e0d51df@enterprisedb.com Whole thread Raw |
In response to | Re: Order changes in PG16 since ICU introduction (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Order changes in PG16 since ICU introduction
|
List | pgsql-hackers |
On 11.05.23 23:29, Jeff Davis wrote: > New patch series attached. > > === 0001: fix bug that allows creating hidden collations > > Bug: > https://www.postgresql.org/message-id/051c9395cf880307865ee8b17acdbf7f838c1e39.camel@j-davis.com This is still being debated in the other thread. Not really related to this thread, so I suggest dropping it from this patch series. > === 0002: handle some kinds of libc-stlye locale strings > > ICU used to handle libc locale strings like 'fr_FR@euro', but doesn't > in later versions. Handle them in postgres for consistency. I tend to agree with ICU that these variants are obsolete, and we don't need to support them anymore. If this were a tiny patch, then maybe ok, but the way it's presented here the whole code is duplicated between pg_locale.c and initdb.c, which is not great. > === 0003: reduce icu_validation_level to WARNING > > Given that we've seen some inconsistency in which locale names are > accepted in different ICU versions, it seems best not to be too strict. > Peter Eisentraut suggested that it be set to ERROR originally, but a > WARNING should be sufficient to see problems without introducing risks > migrating to version 16. I'm not sure why this is the conclusion. Presumably, the detection capabilities of ICU improve over time, so we want to take advantage of that? What are some example scenarios where this change would help? > === 0004-0006: > > To solve the issues that have come up in this thread, we need CREATE > DATABASE (and createdb and initdb) to use LOCALE to mean the collation > locale regardless of which provider is in use (which is what 0006 > does). > > 0006 depends on ICU handling libc locale names. It already does a good > job for most libc locale names (though patch 0002 fixes a few cases > where it doesn't). There may be more cases, but for the most part libc > names are interpreted in a reasonable way. But one important case is > missing: ICU does not handle the "C" locale as we expect (that is, > using memcmp()). > > We've already allowed users to create ICU collations with the C locale > in the past, which uses the root collation (not memcmp()), and we need > to keep supporting that for upgraded clusters. I'm not sure I agree that we need to keep supporting that. The only way you could get that in past releases is if you specify explicitly, "give me provider ICU and locale C", and then it wouldn't actually even work correctly. So nobody should be using that in practice, and nobody should have stumbled into that combination of settings by accident. > 3. Introduce collation provider "none", which is always memcmp-based > (patch 0004). It's equivalent to the libc locale=C, but it allows > specifying the LC_COLLATE and LC_CTYPE independently. A command like > CREATE DATABASE ... LOCALE_PROVIDER='icu' ICU_LOCALE='C' > LC_COLLATE='en_US' would get changed (with a NOTICE) to provider "none" > (patch 0005), so you'd have datlocprovider=none, datcollate=en_US. For > the database default collation, that would always use memcmp(), but the > server environment LC_COLLATE would be set to en_US as the user > specified. This seems most attractive, but I think it's quite invasive at this point, especially given the dubious premise (see above). > === 0007: Add a GUC to control the default collation provider > > Having a GUC would make it easier to migrate to ICU without surprises. > This only affects the default for CREATE COLLATION, not CREATE DATABASE > (and obviously not initdb). It's not clear to me why we would want that. Also not clear why it should only affect CREATE COLLATION.
pgsql-hackers by date: