Re: ICU for global collation - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: ICU for global collation |
Date | |
Msg-id | 7c9270a4-db00-004b-df1a-fede4697a779@enterprisedb.com Whole thread Raw |
In response to | Re: ICU for global collation (Julien Rouhaud <rjuju123@gmail.com>) |
Responses |
RE: ICU for global collation
|
List | pgsql-hackers |
On 15.03.22 08:41, Julien Rouhaud wrote: >>>> The locale object in ICU is an identifier that specifies a particular locale >>>> and has fields for language, country, and an optional code to specify further >>>> variants or subdivisions. These fields also can be represented as a string >>>> with the fields separated by an underscore. >> >> I think the Localization chapter needs to be reorganized a bit, but I'll >> leave that for a separate patch. > > WFM. I ended up writing a bit of content for that chapter. >>> While on that topic, the doc should probably mention that default ICU >>> collations can only be deterministic. >> >> Well, there is no option to do otherwise, so I'm not sure where/how to >> mention that. We usually don't document options that don't exist. ;-) > > Sure, but I'm afraid that users may still be tempted to use ICU locales like > und-u-ks-level2 from the case_insensitive example in the doc and hope that it > will work accordingly. Or maybe it's just me that still sees ICU as dark magic > and want to be extra cautious. I have added this to the CREATE DATABASE ref page. > Unrelated, but I just realized that we have PGC_INTERNAL gucs for lc_ctype and > lc_collate. Should we add one for icu_locale too? I'm not sure. I think the existing ones are more for backward compatibility with the time before it was settable per database. >>> in AlterCollation(), pg_collation_actual_version(), AlterDatabaseRefreshColl() >>> and pg_database_collation_actual_version(): >>> >>> - datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull); >>> - Assert(!isnull); >>> - newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum)); >>> + datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale: Anum_pg_collation_collcollate, &isnull); >>> + if (!isnull) >>> + newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum)); >>> + else >>> + newversion = NULL; >>> >>> The columns are now nullable, but can you actually end up with a null locale >>> name in the expected field without manual DML on the catalog, corruption or >>> similar? I think it should be a plain error explaining the inconsistency >>> rather then silently assuming there's no version. Note that at least >>> pg_newlocale_from_collation() asserts that the specific libc/icu field it's >>> interested in isn't null. >> >> This is required because the default collations's fields are null now. > > Yes I saw that, but that's a specific exception. Detecting whether it's the > DEFAULT_COLLATION_OID or not and raise an error when a null value isn't > expected seems like it could be worthwhile. I have fixed that as you suggest. > So apart from the few details mentioned above I'm happy with this patch! committed!
pgsql-hackers by date: