RE: ICU for global collation - Mailing list pgsql-hackers
From | Shinoda, Noriyoshi (PN Japan FSIP) |
---|---|
Subject | RE: ICU for global collation |
Date | |
Msg-id | PH7PR84MB1885646854837E4E7EC339BFEE129@PH7PR84MB1885.NAMPRD84.PROD.OUTLOOK.COM Whole thread Raw |
In response to | Re: ICU for global collation (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: ICU for global collation
|
List | pgsql-hackers |
Hi, Thank you to all the developers. I found that the description of the pg_database.daticulocale column was not written in the documentation. The attached small patch adds a description of the daticulocale column to catalogs.sgml. Regards, Noriyoshi Shinoda -----Original Message----- From: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Sent: Thursday, March 17, 2022 7:29 PM To: Julien Rouhaud <rjuju123@gmail.com> Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Daniel Verite <daniel@manitou-mail.org> Subject: Re: ICU for global collation On 15.03.22 08:41, Julien Rouhaud wrote: >>>> The locale object in ICU is an identifier that specifies a >>>> particular locale and has fields for language, country, and an >>>> optional code to specify further variants or subdivisions. These >>>> fields also can be represented as a string with the fields separated by an underscore. >> >> I think the Localization chapter needs to be reorganized a bit, but >> I'll leave that for a separate patch. > > WFM. I ended up writing a bit of content for that chapter. >>> While on that topic, the doc should probably mention that default >>> ICU collations can only be deterministic. >> >> Well, there is no option to do otherwise, so I'm not sure where/how >> to mention that. We usually don't document options that don't exist. >> ;-) > > Sure, but I'm afraid that users may still be tempted to use ICU > locales like > und-u-ks-level2 from the case_insensitive example in the doc and hope > that it will work accordingly. Or maybe it's just me that still sees > ICU as dark magic and want to be extra cautious. I have added this to the CREATE DATABASE ref page. > Unrelated, but I just realized that we have PGC_INTERNAL gucs for > lc_ctype and lc_collate. Should we add one for icu_locale too? I'm not sure. I think the existing ones are more for backward compatibility with the time before it was settable per database. >>> in AlterCollation(), pg_collation_actual_version(), >>> AlterDatabaseRefreshColl() and pg_database_collation_actual_version(): >>> >>> - datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull); >>> - Assert(!isnull); >>> - newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum)); >>> + datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale: Anum_pg_collation_collcollate, &isnull); >>> + if (!isnull) >>> + newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum)); >>> + else >>> + newversion = NULL; >>> >>> The columns are now nullable, but can you actually end up with a >>> null locale name in the expected field without manual DML on the >>> catalog, corruption or similar? I think it should be a plain error >>> explaining the inconsistency rather then silently assuming there's >>> no version. Note that at least >>> pg_newlocale_from_collation() asserts that the specific libc/icu >>> field it's interested in isn't null. >> >> This is required because the default collations's fields are null now. > > Yes I saw that, but that's a specific exception. Detecting whether > it's the DEFAULT_COLLATION_OID or not and raise an error when a null > value isn't expected seems like it could be worthwhile. I have fixed that as you suggest. > So apart from the few details mentioned above I'm happy with this patch! committed!
Attachment
pgsql-hackers by date: