Re: ICU for global collation - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: ICU for global collation
Date
Msg-id 20220817165359.mcco6ufe2vchokqo@jrouhaud
Whole thread Raw
In response to Re: ICU for global collation  (Marina Polyakova <m.polyakova@postgrespro.ru>)
Responses Re: ICU for global collation
List pgsql-hackers
Hi,

On Mon, Aug 15, 2022 at 03:06:32PM +0300, Marina Polyakova wrote:
>
> 1.1) It looks like there's a bug in the function get_db_infos
> (src/bin/pg_upgrade/info.c), where the version of the old cluster is always
> checked:
>
> if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
>     snprintf(query + strlen(query), sizeof(query) - strlen(query),
>              "'c' AS datlocprovider, NULL AS daticulocale, ");
> else
>     snprintf(query + strlen(query), sizeof(query) - strlen(query),
>              "datlocprovider, daticulocale, ");
>
> With the simple patch
>
> diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
> index df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87
> 100644
> --- a/src/bin/pg_upgrade/info.c
> +++ b/src/bin/pg_upgrade/info.c
> @@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster)
>
>      snprintf(query, sizeof(query),
>               "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, ");
> -    if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500)
> +    if (GET_MAJOR_VERSION(cluster->major_version) < 1500)
>          snprintf(query + strlen(query), sizeof(query) - strlen(query),
>                   "'c' AS datlocprovider, NULL AS daticulocale, ");
>      else
>
> I got the expected error during the upgrade:
>
> locale providers for database "template1" do not match:  old "libc", new
> "icu"
> Failure, exiting

Good catch.  There's unfortunately not a lot of regression tests for
ICU-initialized clusters.  I'm wondering if the build-farm client could be
taught about the locale provider rather than assuming libc.  Clearly that
wouldn't have caught this issue, but it should still increase the coverage a
bit (I'm thinking of the recent problem with the abbreviated keys).

> 1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the
> following lines earlier:
>
> if (dbiculocale == NULL)
>     dbiculocale = src_iculocale;
>
> The following patch works for me:
>
> diff --git a/src/backend/commands/dbcommands.c
> b/src/backend/commands/dbcommands.c
> index b31a30550b025d48ba3cc250dc4c15f41f9a80be..17a2942341e528c01182fb6d4878580f2706bec9
> 100644
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> @@ -1048,6 +1048,8 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>                      (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                       errmsg("ICU locale must be specified")));
>      }
> +    else
> +        dbiculocale = NULL;
>
>      if (dblocprovider == COLLPROVIDER_ICU)
>          check_icu_locale(dbiculocale);

I think it would be better to do that in the various variable initialization.
Maybe switch the dblocprovider and dbiculocale initialization, and only
initialize dbiculocale if dblocprovider is COLLPROVIDER_ICU.

> 2) CREATE DATABASE does not always require the icu locale unlike initdb and
> createdb:
>
> $ createdb mydb --locale en_US.UTF-8 --template template0 --locale-provider
> icu
> createdb: error: database creation failed: ERROR:  ICU locale must be
> specified
>
> $ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE template0
> LOCALE_PROVIDER icu" postgres
> CREATE DATABASE
>
> I'm wondering if this is not a fully-supported feature (because createdb
> creates an SQL command with LC_COLLATE and LC_CTYPE options instead of
> LOCALE option) or is it a bug in CREATE DATABASE?.. From
> src/backend/commands/dbcommands.c:
>
> if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale)
> {
>     if (dlocale && dlocale->arg)
>         dbiculocale = defGetString(dlocale);
> }

This discrepancy between createdb and CREATE DATABASE looks like an oversight,
as createdb indeed interprets --locale as:

    if (locale)
    {
        if (lc_ctype)
            pg_fatal("only one of --locale and --lc-ctype can be specified");
        if (lc_collate)
            pg_fatal("only one of --locale and --lc-collate can be specified");
        lc_ctype = locale;
        lc_collate = locale;
    }

AFAIK the fallback in the CREATE DATABASE case is expected as POSIX locale
names should be accepted by icu, so this should work for createdb too.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: pg_walinspect: ReadNextXLogRecord's first_record argument
Next
From: Robert Haas
Date:
Subject: Re: pg_walinspect: ReadNextXLogRecord's first_record argument