RE: ICU for global collation - Mailing list pgsql-hackers

From Shinoda, Noriyoshi (PN Japan FSIP)
Subject RE: ICU for global collation
Date
Msg-id PH7PR84MB1885646854837E4E7EC339BFEE129@PH7PR84MB1885.NAMPRD84.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: ICU for global collation  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Responses Re: ICU for global collation
List pgsql-hackers
Hi, 
Thank you to all the developers.
I found that the description of the pg_database.daticulocale column was not written in the documentation. 
The attached small patch adds a description of the daticulocale column to catalogs.sgml.

Regards,
Noriyoshi Shinoda
-----Original Message-----
From: Peter Eisentraut <peter.eisentraut@enterprisedb.com> 
Sent: Thursday, March 17, 2022 7:29 PM
To: Julien Rouhaud <rjuju123@gmail.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Daniel Verite <daniel@manitou-mail.org>
Subject: Re: ICU for global collation

On 15.03.22 08:41, Julien Rouhaud wrote:
>>>> The locale object in ICU is an identifier that specifies a 
>>>> particular locale and has fields for language, country, and an 
>>>> optional code to specify further variants or subdivisions. These 
>>>> fields also can be represented as a string with the fields separated by an underscore.
>>
>> I think the Localization chapter needs to be reorganized a bit, but 
>> I'll leave that for a separate patch.
> 
> WFM.

I ended up writing a bit of content for that chapter.

>>> While on that topic, the doc should probably mention that default 
>>> ICU collations can only be deterministic.
>>
>> Well, there is no option to do otherwise, so I'm not sure where/how 
>> to mention that.  We usually don't document options that don't exist. 
>> ;-)
> 
> Sure, but I'm afraid that users may still be tempted to use ICU 
> locales like
> und-u-ks-level2 from the case_insensitive example in the doc and hope 
> that it will work accordingly.  Or maybe it's just me that still sees 
> ICU as dark magic and want to be extra cautious.

I have added this to the CREATE DATABASE ref page.

> Unrelated, but I just realized that we have PGC_INTERNAL gucs for 
> lc_ctype and lc_collate.  Should we add one for icu_locale too?

I'm not sure.  I think the existing ones are more for backward compatibility with the time before it was settable per
database.

>>> in AlterCollation(), pg_collation_actual_version(), 
>>> AlterDatabaseRefreshColl() and pg_database_collation_actual_version():
>>>
>>> -   datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collcollate, &isnull);
>>> -   Assert(!isnull);
>>> -   newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
>>> +   datum = SysCacheGetAttr(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ?
Anum_pg_collation_colliculocale: Anum_pg_collation_collcollate, &isnull);
 
>>> +   if (!isnull)
>>> +       newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
>>> +   else
>>> +       newversion = NULL;
>>>
>>> The columns are now nullable, but can you actually end up with a 
>>> null locale name in the expected field without manual DML on the 
>>> catalog, corruption or similar?  I think it should be a plain error 
>>> explaining the inconsistency rather then silently assuming there's 
>>> no version.  Note that at least
>>> pg_newlocale_from_collation() asserts that the specific libc/icu 
>>> field it's interested in isn't null.
>>
>> This is required because the default collations's fields are null now.
> 
> Yes I saw that, but that's a specific exception.  Detecting whether 
> it's the DEFAULT_COLLATION_OID or not and raise an error when a null 
> value isn't expected seems like it could be worthwhile.

I have fixed that as you suggest.

> So apart from the few details mentioned above I'm happy with this patch!

committed!



Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Logical replication timeout problem
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Skipping logical replication transactions on subscriber side