Thread: Names of encodings, lc_collate, lc_ctype

Names of encodings, lc_collate, lc_ctype

From
Holger Jakobs
Date:
Hello,

when using the following command on PostgreSQL 9.6 on a SLES Linux 
machine

   pg_dump -h machine1 -C

the output contains this line

   CREATE DATABASE db1 WITH TEMPLATE = template0 ENCODING = 'UTF8' 
LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

which causes trouble on a PostgreSQL 10 or 11 on an Ubuntu 18.04 machine

   ungültiger Locale-Name: »en_US.UTF-8«  (meaning 'illegal locale name')

The command

   select * from pg_collation;

shows (among many others of course)

   en_US.utf8

But even removing the hyphen in 'en_US.UTF-8' and converting 'UTF' to 
lower case doesn't remove the error.

How come there are encodings/collations/locales with and without hyphen? 
Why does the Ubuntu machine not accept a locale which is present in 
lc_collation?

Best Regards,

Holger

-- 
Holger Jakobs, 51469 Bergisch Gladbach
+49 178 9759012



Re: Names of encodings, lc_collate, lc_ctype

From
Tom Lane
Date:
Holger Jakobs <holger@jakobs.com> writes:
>    CREATE DATABASE db1 WITH TEMPLATE = template0 ENCODING = 'UTF8' 
> LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
> which causes trouble on a PostgreSQL 10 or 11 on an Ubuntu 18.04 machine
>    ungültiger Locale-Name: »en_US.UTF-8«  (meaning 'illegal locale name')

Hmm, does "locale -a" show that you have en_US installed?

It's basically on the platform's libc to say whether the values for
LC_COLLATE and LC_CTYPE are valid.  In my experience, glibc is quite
forgiving about how the encoding suffix is spelled, so I'm wondering
if your destination machine is simply lacking the locale definition.

> The command
>    select * from pg_collation;
> shows (among many others of course)
>    en_US.utf8

This doesn't have anything to do with what CREATE DATABASE accepts,
IIRC.  It does show that when initdb ran, it saw en_US.utf8 reported
by "locale -a"; but maybe that was in a different environment.

> How come there are encodings/collations/locales with and without hyphen? 
> Why does the Ubuntu machine not accept a locale which is present in 
> lc_collation?

Interesting questions, but you need a glibc expert not a Postgres
expert.

            regards, tom lane



Re: Names of encodings, lc_collate, lc_ctype

From
Ron
Date:
On 7/10/19 8:26 AM, Tom Lane wrote:
> Holger Jakobs <holger@jakobs.com> writes:
>>     CREATE DATABASE db1 WITH TEMPLATE = template0 ENCODING = 'UTF8'
>> LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
>> which causes trouble on a PostgreSQL 10 or 11 on an Ubuntu 18.04 machine
>>     ungültiger Locale-Name: »en_US.UTF-8«  (meaning 'illegal locale name')
> Hmm, does "locale -a" show that you have en_US installed?
>
> It's basically on the platform's libc to say whether the values for
> LC_COLLATE and LC_CTYPE are valid.  In my experience, glibc is quite
> forgiving about how the encoding suffix is spelled, so I'm wondering
> if your destination machine is simply lacking the locale definition.

My Ubuntu 18.04 system (upgraded from 16.04) has these:
C
C.UTF-8
en_US.utf8
POSIX


-- 
Angular momentum makes the world go 'round.



Re: Names of encodings, lc_collate, lc_ctype

From
Holger Jakobs
Date:

Dear Tom,

After creating the locale with

  sudo locale-gen en_US.UTF-8

and a restart of the PostgreSQL server, it worked.

Thank you.

Holger Jakobs

Am 10.07.19 um 15:26 schrieb Tom Lane:
Holger Jakobs <holger@jakobs.com> writes:
   CREATE DATABASE db1 WITH TEMPLATE = template0 ENCODING = 'UTF8' 
LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';
which causes trouble on a PostgreSQL 10 or 11 on an Ubuntu 18.04 machine  ungültiger Locale-Name: »en_US.UTF-8«  (meaning 'illegal locale name')
Hmm, does "locale -a" show that you have en_US installed?

It's basically on the platform's libc to say whether the values for
LC_COLLATE and LC_CTYPE are valid.  In my experience, glibc is quite
forgiving about how the encoding suffix is spelled, so I'm wondering
if your destination machine is simply lacking the locale definition.

The command  select * from pg_collation;
shows (among many others of course)  en_US.utf8
This doesn't have anything to do with what CREATE DATABASE accepts,
IIRC.  It does show that when initdb ran, it saw en_US.utf8 reported
by "locale -a"; but maybe that was in a different environment.

How come there are encodings/collations/locales with and without hyphen? 
Why does the Ubuntu machine not accept a locale which is present in 
lc_collation?
Interesting questions, but you need a glibc expert not a Postgres
expert.
		regards, tom lane
--

Holger Jakobs, Bergisch Gladbach
instant messaging: xmpp:holger@jakobs.com
+49 178 9759012 oder +49 2202 817157