Re: ICU for global collation - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: ICU for global collation
Date
Msg-id 525ef44f-52bf-505f-a491-07835d039424@enterprisedb.com
Whole thread Raw
In response to Re: ICU for global collation  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: ICU for global collation
Re: ICU for global collation
Re: ICU for global collation
List pgsql-hackers
There were a few inquiries about this topic recently, so I dug up the 
old thread and patch.  What we got stuck on last time was that we can't 
just swap out all locale support in a database for ICU.  We still need 
to set the usual locale environment, otherwise some things that are not 
ICU aware will break or degrade.  I had initially anticipated fixing 
that by converting everything that uses libc locales to ICU.  But that 
turned out to be tedious and ultimately not very useful as far as the 
user-facing result is concerned, so I gave up.

So this is a different approach: If you choose ICU as the default locale 
for a database, you still need to specify lc_ctype and lc_collate 
settings, as before.  Unlike in the previous patch, where the ICU 
collation name was written in datcollate, there is now a third column 
(daticucoll), so we can store all three values.  This fixes the 
described problem.  Other than that, once you get all the initial 
settings right, it basically just works: The places that have ICU 
support now will use a database-wide ICU collation if appropriate, the 
places that don't have ICU support continue to use the global libc 
locale settings.

I changed the datcollate, datctype, and the new daticucoll fields to 
type text (from name).  That way, the daticucoll field can be set to 
null if it's not applicable.  Also, the limit of 63 characters can 
actually be a problem if you want to use some combination of the options 
that ICU locales offer.  And for less extreme uses, having 
variable-length fields will save some storage, since typical locale 
names are much shorter.

For the same reasons and to keep things consistent, I also changed the 
analogous pg_collation fields like that.  This also removes some weird 
code that has to check that colcollate and colctype have to be the same 
for ICU, so it's overall cleaner.
Attachment

pgsql-hackers by date:

Previous
From: Maxim Orlov
Date:
Subject: Re: Pre-allocating WAL files
Next
From: Maxim Orlov
Date:
Subject: Add 64-bit XIDs into PostgreSQL 15