Thread: Re: pgsql: Add option to use ICU as global locale provider

Re: pgsql: Add option to use ICU as global locale provider

From
Christoph Berg
Date:
Re: Peter Eisentraut
> Since some (legacy) code still uses the libc locale facilities
> directly, we still need to set the libc global locale settings even if
> ICU is otherwise selected.  So pg_database now has three
> locale-related fields: the existing datcollate and datctype, which are
> always set, and a new daticulocale, which is only set if ICU is
> selected.  A similar change is made in pg_collation for consistency,
> but in that case, only the libc-related fields or the ICU-related
> field is set, never both.

Since the intended usage seems to be that databases should either be
using libc, or the ICU locales, but probably not both at the same
time, does it make sense to clutter the already very wide `psql -l`
output with two new extra columns?

This hardly fits in normal-size terminals:

=# \l
                                             List of databases
   Name    │ Owner │ Encoding │  Collate   │   Ctype    │ ICU Locale │ Locale Provider │ Access privileges
───────────┼───────┼──────────┼────────────┼────────────┼────────────┼─────────────────┼───────────────────
 postgres  │ myon  │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │            │ libc            │
 template0 │ myon  │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │            │ libc            │ =c/myon          ↵
           │       │          │            │            │            │                 │ myon=CTc/myon
 template1 │ myon  │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │            │ libc            │ =c/myon          ↵
           │       │          │            │            │            │                 │ myon=CTc/myon
(3 rows)

(Even longer if the username is "postgres")

It also makes \l+ even harder to read when the most often only
relevant new column, the database size, is even more to the far right.

Couldn't that be a single "Locale" column, possibly extended by more
info in parentheses if the values differ?

 Locale
 de_DE.utf8
 de-x-icu-whatever
 de_DE.utf8 (Ctype: C.UTF-8)
 SQL_ASCII (ICU Locale: en-x-something)

Christoph



Re: pgsql: Add option to use ICU as global locale provider

From
Peter Eisentraut
Date:
On 19.03.22 18:53, Christoph Berg wrote:
> Re: Peter Eisentraut
>> Since some (legacy) code still uses the libc locale facilities
>> directly, we still need to set the libc global locale settings even if
>> ICU is otherwise selected.  So pg_database now has three
>> locale-related fields: the existing datcollate and datctype, which are
>> always set, and a new daticulocale, which is only set if ICU is
>> selected.  A similar change is made in pg_collation for consistency,
>> but in that case, only the libc-related fields or the ICU-related
>> field is set, never both.
> 
> Since the intended usage seems to be that databases should either be
> using libc, or the ICU locales, but probably not both at the same
> time, does it make sense to clutter the already very wide `psql -l`
> output with two new extra columns?

Good point, let me think about that.



Re: pgsql: Add option to use ICU as global locale provider

From
Christoph Berg
Date:
Re: Peter Eisentraut
> > Since the intended usage seems to be that databases should either be
> > using libc, or the ICU locales, but probably not both at the same
> > time, does it make sense to clutter the already very wide `psql -l`
> > output with two new extra columns?
> 
> Good point, let me think about that.

A possible solution might be to rip out all the locale columns except
"Encoding" from \l, and leave them in place for \l+.

For \l+, I'd suggest moving the database size and the tablespace to
the front, after owner.

Christoph



Re: pgsql: Add option to use ICU as global locale provider

From
Tom Lane
Date:
Christoph Berg <myon@debian.org> writes:
> A possible solution might be to rip out all the locale columns except
> "Encoding" from \l, and leave them in place for \l+.

I'd rather see a single column summarizing the locale situation.
Perhaps it could be COALESCE(daticulocale, datcollate), or
something using a CASE on datlocprovider?
Then \l+ could replace that with all the underlying columns.

> For \l+, I'd suggest moving the database size and the tablespace to
> the front, after owner.

I think it's confusing if the + and non-+ versions of a command
present their columns in inconsistent orders.  I'm not dead set
against this, but -0.5 or so.

            regards, tom lane



Inconsistent "ICU Locale" output on older server versions

From
Christoph Berg
Date:
Re: To Peter Eisentraut
> This hardly fits in normal-size terminals:
> 
> =# \l
>                                              List of databases
>    Name    │ Owner │ Encoding │  Collate   │   Ctype    │ ICU Locale │ Locale Provider │ Access privileges
> ───────────┼───────┼──────────┼────────────┼────────────┼────────────┼─────────────────┼───────────────────
>  postgres  │ myon  │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │            │ libc            │
>  template0 │ myon  │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │            │ libc            │ =c/myon          ↵
>            │       │          │            │            │            │                 │ myon=CTc/myon
>  template1 │ myon  │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │            │ libc            │ =c/myon          ↵
>            │       │          │            │            │            │                 │ myon=CTc/myon
> (3 rows)

Another gripe here: The above is the output when run against a PG15
cluster, created without an ICU locale set.

When running psql 15 against PG 14, the output is this:

$ psql -l
                                                List of databases
   Name    │  Owner   │ Encoding │  Collate   │   Ctype    │ ICU Locale │ Locale Provider │   Access privileges
───────────┼──────────┼──────────┼────────────┼────────────┼────────────┼─────────────────┼───────────────────────
 postgres  │ postgres │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │ de_DE.utf8 │ libc            │
 template0 │ postgres │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │ de_DE.utf8 │ libc            │ =c/postgres          ↵
           │          │          │            │            │            │                 │ postgres=CTc/postgres
 template1 │ postgres │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │ de_DE.utf8 │ libc            │ =c/postgres          ↵
           │          │          │            │            │            │                 │ postgres=CTc/postgres
(3 rows)

The "ICU Locale" column is now populated, that seems wrong.

The problem is in the else branch in src/bin/psql/describe.c around
line 900:

+   if (pset.sversion >= 150000)
+       appendPQExpBuffer(&buf,
+                         "       d.daticulocale as \"%s\",\n"
+                         "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\
+                         gettext_noop("ICU Locale"),
+                         gettext_noop("Locale Provider"));
+   else
+       appendPQExpBuffer(&buf,
+                         "       d.datcollate as \"%s\",\n"  <--- there
+                         "       'libc' AS \"%s\",\n",
+                         gettext_noop("ICU Locale"),
+                         gettext_noop("Locale Provider"));

I'd think this should rather be

+                         "       '' as \"%s\",\n"

Christoph



Re: Inconsistent "ICU Locale" output on older server versions

From
"Euler Taveira"
Date:
On Fri, Apr 15, 2022, at 11:58 AM, Christoph Berg wrote:
When running psql 15 against PG 14, the output is this:

$ psql -l
                                                List of databases
   Name    │  Owner   │ Encoding │  Collate   │   Ctype    │ ICU Locale │ Locale Provider │   Access privileges
───────────┼──────────┼──────────┼────────────┼────────────┼────────────┼─────────────────┼───────────────────────
postgres  │ postgres │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │ de_DE.utf8 │ libc            │
template0 │ postgres │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │ de_DE.utf8 │ libc            │ =c/postgres          ↵
           │          │          │            │            │            │                 │ postgres=CTc/postgres
template1 │ postgres │ UTF8     │ de_DE.utf8 │ de_DE.utf8 │ de_DE.utf8 │ libc            │ =c/postgres          ↵
           │          │          │            │            │            │                 │ postgres=CTc/postgres
(3 rows)

The "ICU Locale" column is now populated, that seems wrong.
Good catch!

The problem is in the else branch in src/bin/psql/describe.c around
line 900:

+   if (pset.sversion >= 150000)
+       appendPQExpBuffer(&buf,
+                         "       d.daticulocale as \"%s\",\n"
+                         "       CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\
+                         gettext_noop("ICU Locale"),
+                         gettext_noop("Locale Provider"));
+   else
+       appendPQExpBuffer(&buf,
+                         "       d.datcollate as \"%s\",\n"  <--- there
+                         "       'libc' AS \"%s\",\n",
+                         gettext_noop("ICU Locale"),
+                         gettext_noop("Locale Provider"));

I'd think this should rather be

+                         "       '' as \"%s\",\n"
Since dataiculocale allows NULL, my suggestion is to use NULL instead of an
empty string. It is consistent with a cluster whose locale provider is libc.


--
Euler Taveira

Re: Inconsistent "ICU Locale" output on older server versions

From
Tom Lane
Date:
"Euler Taveira" <euler@eulerto.com> writes:
> On Fri, Apr 15, 2022, at 11:58 AM, Christoph Berg wrote:
>> When running psql 15 against PG 14, the output is this:
>> The "ICU Locale" column is now populated, that seems wrong.

> Good catch!

Indeed.

> Since dataiculocale allows NULL, my suggestion is to use NULL instead of an
> empty string. It is consistent with a cluster whose locale provider is libc.

Yeah, I agree.  We should make the pre-v15 output match what you'd see
if looking at a non-ICU v15 database.

            regards, tom lane



psql -l and locales (Re: pgsql: Add option to use ICU as global locale provider)

From
Christoph Berg
Date:
Re: Tom Lane
> Christoph Berg <myon@debian.org> writes:
> > A possible solution might be to rip out all the locale columns except
> > "Encoding" from \l, and leave them in place for \l+.
> 
> I'd rather see a single column summarizing the locale situation.
> Perhaps it could be COALESCE(daticulocale, datcollate), or
> something using a CASE on datlocprovider?
> Then \l+ could replace that with all the underlying columns.

Fwiw I still think the default psql -l output should be more concise.
Any chance to have that happen for PG15?

I can try creating a patch if it has chances of getting through.

Christoph