Re: ICU integration - Mailing list pgsql-hackers

From Doug Doole
Subject Re: ICU integration
Date
Msg-id 51759C7D-BE9E-4173-B5F9-EFD7B9594C2B@salesforce.com
Whole thread Raw
In response to Re: ICU integration  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: ICU integration  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
List pgsql-hackers
Hi all. I’m new to the PostgreSQL code and the mailing list, but I’ve had a lot of experience with using ICU in a
differentdatabase product. So while I’m not up to speed on the code yet, I can offer some insights on using ICU. 

> On Aug 30, 2016, at 9:12 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>> How stable are the UCU locales? Most importantly, does ICU offer any
>> way to "pin" a locale version, so we can say "we want de_DE as it was
>> in ICU 4.6" and get consistent behaviour when the user sets up a
>> replica on some other system with ICU 4.8? Even if the German
>> government has changed its mind (again) about some details of the
>> language and 4.8 knows about the changes but 4.4 doesn’t?

ICU explicitly does not provide stability in their locales and collations. We pushed them hard to provide this, but
betweenchanges to the CLDR data and changes to the ICU code it just wasn’t feasible for them to provide version to
versionstability. 

What they do offer is a compile option when building ICU to version all their APIs. So instead of calling icu_foo()
you’dcall icu_foo46(). (Or something like this - it’s been a few years since I actually worked with the ICU code.) This
ultimatelyallows you to load multiple versions of the ICU library into a single program and provide stability by
callingthe appropriate version of the library. (Unfortunately, the OS - at least my Linux box - only provides the
genericversion of ICU and not the version annotated APIs, which means a separate compile of ICU is needed.) 

The catch with this is that it means you likely want to expose the version information. In another note it was
suggestedto use something like fr_FR%icu. If you want to pin it to a specific version of ICU, you’ll likely need
somethinglike fr_FR%icu46. (There’s nothing wrong with supporting fr_FR%icu to give users an easy way of saying “give
methe latest and greatest”, but you’d probably want to harden it to a specific ICU version internally.) 

> I forgot to mention this, but the patch adds a collversion column that
> stores the collation version (provided by ICU).  And then when you
> upgrade ICU to something incompatible you get
>
> +           if (numversion != collform->collversion)
> +               ereport(WARNING,
> +                       (errmsg("ICU collator version mismatch"),
> +                        errdetail("The database was created using
> version 0x%08X, the library provides version 0x%08X.",
> +                                  (uint32) collform->collversion,
> (uint32) numversion),
> +                        errhint("Rebuild affected indexes, or build
> PostgreSQL with the right version of ICU.")));
>
> So you still need to manage this carefully, but at least you have a
> chance to learn about it.

Indexes are the obvious place where collation comes into play, and are relatively easy to address. But consider all the
placeswhere string comparisons can be done. For example, check constraints and referential constraints can depend on
stringcomparisons. If the collation rules change because of a new version of ICU, the database can become inconsistent
andwill need a lot more work than an index rebuild. 

> Suggestions for refining this are welcome.
>
> --
> Peter Eisentraut              http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Doug Doole
Salesforce




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Patch: Write Amplification Reduction Method (WARM)
Next
From: Kevin Grittner
Date:
Subject: Re: delta relations in AFTER triggers