Re: ICU for global collation - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: ICU for global collation
Date
Msg-id 20220916.171116.507864395269358149.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: ICU for global collation  (Marina Polyakova <m.polyakova@postgrespro.ru>)
Responses Re: ICU for global collation
List pgsql-hackers
At Fri, 16 Sep 2022 09:49:28 +0300, Marina Polyakova <m.polyakova@postgrespro.ru> wrote in 
> On 2022-09-15 09:52, Kyotaro Horiguchi wrote:
> > However, when I reran the command, it complains about incompatible
> > encoding this time.  I think it's more user-friendly to check for the
> > encoding compatibility before the check for missing --icu-locale
> > option.
> > regards.
> 
> In continuation of options check: AFAICS the following checks in
> initdb
> 
>     if (locale_provider == COLLPROVIDER_ICU)
>     {
>         if (!icu_locale)
>             pg_fatal("ICU locale must be specified");
> 
>         /*
>          * In supported builds, the ICU locale ID will be checked by the
>          * backend during post-bootstrap initialization.
>          */
> #ifndef USE_ICU
>         pg_fatal("ICU is not supported in this build");
> #endif
>     }
> 
> are executed approximately when they are executed in create database
> after getting all the necessary data from the template database:

initdb doesn't work that way, but anyway, I realized that I am
proposing to move that code in setlocales() to the caller function as
the result. I don't think setlocales() is the place for the code
because icu locale has no business with what the function does.  That
being said there's no obvious reason we *need* to move the code out to
its caller.

> if (dblocprovider == COLLPROVIDER_ICU)
> {
>     /*
>      * This would happen if template0 uses the libc provider but the new
>      * database uses icu.
>      */
>     if (!dbiculocale)
>         ereport(ERROR,
>                 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                  errmsg("ICU locale must be specified")));
> }
> 
> if (dblocprovider == COLLPROVIDER_ICU)
>     check_icu_locale(dbiculocale);
> 
> But perhaps the check that --icu-locale cannot be specified unless
> locale provider icu is chosen should also be moved here? So all these
> checks will be in one place and it will use the provider from the
> template database (which could be icu):
> 
> $ initdb --locale-provider icu --icu-locale en-US -D data &&
> pg_ctl -D data -l logfile start &&
> createdb --icu-locale ru-RU --template template0 mydb
> ...
> createdb: error: database creation failed: ERROR: ICU locale cannot be
> specified unless locale provider is ICU

And, I realized that this causes bigger churn than I thought. So, I'm
sorry but I withdraw the comment.

Thus the first proposed patch will be more or less the direction we
would go. And the patch looks good to me as a whole.

+                     errmsg("encoding \"%s\" is not supported with ICU provider",

+        pg_log_error("encoding \"%s\" is not supported with ICU provider",
+                     pg_encoding_to_char(encodingid));

I might be wrong, but the messages look wrong to me.  The alternatives
below might work.

"encoding \"%s\" is not supported by ICU"
"encoding \"%s\" cannot be used for/with ICU locales"

+        pg_log_error_hint("Rerun %s and choose a matching combination.",
+                          progname);

This doesn't seem to provide users with useful information.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: ICU for global collation
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: ICU for global collation