Re: utf8 vs UTF-8 - Mailing list pgsql-general

From Adrian Klaver
Subject Re: utf8 vs UTF-8
Date
Msg-id f510e041-7e9b-4745-847b-06b9dcce6281@aklaver.com
Whole thread Raw
In response to Re: utf8 vs UTF-8  (Troels Arvin <troels@arvin.dk>)
Responses Re: utf8 vs UTF-8
List pgsql-general
On 5/18/24 07:48, Troels Arvin wrote:
> Hello,
> 
> Tom Lane wrote:
>  >>  test1  | loc_test | UTF8   | libc     | en_US.UTF-8 | en_US.UTF-8
>  >>  test3  | troels   | UTF8   | libc     | en_US.utf8  | en_US.utf8
>  >
>  > On most if not all platforms, both those spellings of the locale names
>  > will be taken as valid.  You might try running "locale -a" to get an
>  > idea of which one is preferred according to your current libc
>  > installation
> 
> "locale -a" on the Ubuntu system outputs this:
> 
>    C
>    C.utf8
>    en_US.utf8
>    POSIX

If you expand that to locale -v -a you get:

locale: en_US.utf8      archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
     title | English locale for the USA
    source | Free Software Foundation, Inc.
   address | https://www.gnu.org/software/libc/
     email | bug-glibc-locales@gnu.org
  language | American English
territory | United States
  revision | 1.0
      date | 2000-06-24
   codeset | UTF-8



> So at first, I thought en_US.utf8 would be the most correct locale 
> identifier. However, when I look at Postgres' own databases, they have 
> the slightly different locale string:
> 
>    psql --list | grep -E 'postgres|template'
>    postgres  | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>    template0 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
>    template1 | postgres | UTF8 | libc | en_US.UTF-8 | en_US.UTF-8 | ...
> 
> Also, when I try to create a database with "en_US.utf8" as locale 
> without specifying a template:
> 
> troels=# create database test4 locale 'en_US.utf8';
> ERROR:  new collation (en_US.utf8) is incompatible with the collation of 
> the template database (en_US.UTF-8)
> HINT:  Use the same collation as in the template database, or use 
> template0 as template.

I'm going to say that is Postgres being exact to a fault.

> 
> Given the locale of Postgres' own databases and Postgres' error message, 
> I'm leaning to en_US.UTF-8 being the most correct locale to use. Because 
> why would Postgres care about it, if utf8/UTF-8 doesn't matter?
> 
> 
>> but TBH, I doubt it's worth worrying about.
> 
> But couldn't there be an issue, if for example the client's locale and 
> the server's locale aren't exactly the same? I'm thinking maybe the 
> client library has to perform unneeded translation of the stream of data 
> to/from the database?



-- 
Adrian Klaver
adrian.klaver@aklaver.com




pgsql-general by date:

Previous
From: Erik Wienhold
Date:
Subject: Re: Left join syntax error
Next
From: Rich Shepard
Date:
Subject: Re: Left join syntax error