Re: ICU for global collation - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: ICU for global collation
Date
Msg-id 07878ad1-d94d-5a92-565f-c0dfdea8b61b@enterprisedb.com
Whole thread Raw
In response to Re: ICU for global collation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 15.03.22 18:28, Robert Haas wrote:
> On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> On 14.03.22 19:57, Robert Haas wrote:
>>> 1. What will happen if I set the ICU collation to something that
>>> doesn't match the libc collation? How bad are the consequences?
>>
>> These are unrelated, so there are no consequences.
> 
> Can you please elaborate on this?

The code that is aware of ICU generally works like this:

if (locale_provider == ICU)
   result = call ICU code
else
   result = call libc code
return result

However, there is code out there, both within PostgreSQL itself and in 
extensions, that does not do that yet.  Ideally, we would eventually 
change all that over, but it's not happening now.  So we ought to 
preserve the ability to set the libc to keep that legacy code working 
for now.

This legacy code by definition doesn't know about ICU, so it doesn't 
care whether the ICU setting "matches" the libc setting or anything like 
that.  It will just do its thing depending on its own setting.

The only consequence of settings that don't match is that the different 
pieces of code behave semantically inconsistently (e.g., some routine 
thinks the data is Greek and other code thinks the data is French).  But 
that's up to the user to set correctly.  And the actual scenarios where 
you can actually do anything semantically relevant this way are very 
limited.

A second point is that the LC_CTYPE setting tells other parts of libc 
what the current encoding is.  This affects gettext for example.  So you 
need to set this to something sensible even if you don't use libc locale 
routines otherwise.

>>> 2. If I want to avoid a mismatch between the two, then I will need a
>>> way to figure out which libc collation corresponds to a given ICU
>>> collation. How do I do that?
>>
>> You can specify the same name for both.
> 
> Hmm. If every name were valid in both systems, I don't think you'd be
> proposing two fields.

Earlier versions of this patch and predecessor patches indeed had common 
fields.  But in fact the two systems accept different values if you want 
to delve into the advanced features.  But for basic usage something like 
"en_US" will work for both.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Corruption during WAL replay
Next
From: Stephen Frost
Date:
Subject: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats