Home > mailing lists

Re: ICU for global collation - Mailing list pgsql-hackers

From	Kyotaro Horiguchi
Subject	Re: ICU for global collation
Date	September 16, 2022 07:55:19
Msg-id	20220916.135519.1552320805811493586.horikyota.ntt@gmail.com Whole thread Raw
In response to	Re: ICU for global collation (Marina Polyakova <m.polyakova@postgrespro.ru>)
Responses	Re: ICU for global collation
List	pgsql-hackers

Tree view

At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova <m.polyakova@postgrespro.ru> wrote in 
> P.S. While working on the patch, I discovered that UTF8 encoding is
> always used for the ICU provider in initdb unless it is explicitly
> specified by the user:
> 
> if (!encoding && locale_provider == COLLPROVIDER_ICU)
>     encodingid = PG_UTF8;
> 
> IMO this creates additional errors for locales with other encodings:
> 
> $ initdb --locale de_DE.iso885915@euro --locale-provider icu
> --icu-locale de-DE
> ...
> initdb: error: encoding mismatch
> initdb: detail: The encoding you selected (UTF8) and the encoding that
> the selected locale uses (LATIN9) do not match. This would lead to
> misbehavior in various character string processing functions.
> initdb: hint: Rerun initdb and either do not specify an encoding
> explicitly, or choose a matching combination.
> 
> And ICU supports many encodings, see the contents of pg_enc2icu_tbl in
> encnames.c...

It seems to me the best default that fits almost all cases using icu
locales.

So, we need to specify encoding explicitly in that case.

$ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro --locale-provider icu --icu-locale de-DE

However, I think it is hardly understantable from the documentation.

(I checked this using euc-jp [1] so it might be wrong..)

[1] initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider icu --icu-locale ja-x-icu

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

pgsql-hackers by date:

From: Ken Kato
Date: 16 September 2022, 07:23:06
Subject: Re: Add last_vacuum_index_scans in pg_stat_all_tables

From: Kyotaro Horiguchi
Date: 16 September 2022, 08:37:17
Subject: Re: START_REPLICATION SLOT causing a crash in an assert build

Re: ICU for global collation - Mailing list pgsql-hackers

Previous

Next