Re: ICU for global collation - Mailing list pgsql-hackers
From | Marina Polyakova |
---|---|
Subject | Re: ICU for global collation |
Date | |
Msg-id | 79f410460c4fc9534000785adb8bf39a@postgrespro.ru Whole thread Raw |
In response to | Re: ICU for global collation (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: ICU for global collation
|
List | pgsql-hackers |
On 2022-10-01 15:07, Peter Eisentraut wrote: > On 22.09.22 20:06, Marina Polyakova wrote: >> On 2022-09-21 17:53, Peter Eisentraut wrote: >>> Committed with that test, thanks. I think that covers all the ICU >>> issues you reported for PG15 for now? >> >> I thought about the order of the ICU checks - if it is ok to check >> that the selected encoding is supported by ICU after printing all the >> locale & encoding information, why not to move almost all the ICU >> checks here?.. > > It's possible that we can do better, but I'm not going to add things > like that to PG 15 at this point unless it fixes a faulty behavior. Will PG 15 always have this order of ICU checks, is the current behaviour correct enough? On the other hand, there may be a better fix for PG 16+ and not all changes can be backported... On 2022-09-16 10:56, Peter Eisentraut wrote: > On 15.09.22 17:41, Marina Polyakova wrote: >> I agree with you. Here's another version of the patch. The >> locale/encoding checks and reports in initdb have been reordered, >> because now the encoding is set first and only then the ICU locale is >> checked. > > I committed something based on the first version of your patch. This > reordering of the messages here was a little too much surgery for me > at this point. For instance, there are also messages in #ifdef WIN32 > code that would need to be reordered as well. I kept the overall > structure of the code the same and just inserted the additional > proposed checks. > > If you want to pursue the reordering of the checks and messages > overall, a patch for the master branch could be considered. I've worked on this again (see attached patch) but I'm not sure if the messages of encoding mismatches are clear enough without the full locale information. For $ initdb -D data --icu-locale en --locale-provider icu compare the outputs: The database cluster will be initialized with this locale configuration: provider: icu ICU locale: en LC_COLLATE: de_DE.iso885915@euro LC_CTYPE: de_DE.iso885915@euro LC_MESSAGES: en_US.utf8 LC_MONETARY: de_DE.iso885915@euro LC_NUMERIC: de_DE.iso885915@euro LC_TIME: de_DE.iso885915@euro The default database encoding has been set to "UTF8". initdb: error: encoding mismatch initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. and Encoding "UTF8" implied by locale will be set as the default database encoding. initdb: error: encoding mismatch initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. The same without ICU, e.g. for $ initdb -D data the output with locale information: The database cluster will be initialized with this locale configuration: provider: libc LC_COLLATE: en_US.utf8 LC_CTYPE: de_DE.iso885915@euro LC_MESSAGES: en_US.utf8 LC_MONETARY: de_DE.iso885915@euro LC_NUMERIC: de_DE.iso885915@euro LC_TIME: de_DE.iso885915@euro The default database encoding has accordingly been set to "LATIN9". initdb: error: encoding mismatch initdb: detail: The encoding you selected (LATIN9) and the encoding that the selected locale uses (UTF8) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. and the "shorter" version: Encoding "LATIN9" implied by locale will be set as the default database encoding. initdb: error: encoding mismatch initdb: detail: The encoding you selected (LATIN9) and the encoding that the selected locale uses (UTF8) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. BTW, what did you mean that "there are also messages in #ifdef WIN32 code that would need to be reordered as well"?.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
pgsql-hackers by date: