Thread: Problems with a custom LOCALE

Problems with a custom LOCALE

From
"James B. Byrne"
Date:
Having got the 9.2 instance running in parallel with the previous
version I am now encountering this difficulty:

Couldn't create database for {"adapter"=>"postgresql",
"collate"=>"en_US.UTF-8", "ctype"=>"en_US.UTF-8", "encoding"=>"UTF8",
"database"=>"hll_th_forex_development",
"host"=>"inet01.hamilton.harte-lyne.ca", "pool"=>"5",
"password"=>"hll_theheart_devl_password", "sslmode"=>"require",
"template"=>"template1", "username"=>"hll_theheart_db_devl"}
PG::Error: ERROR:  encoding "UTF8" does not match locale
"en_CA@yyyy-mmm-dd.UTF-8"
DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
: CREATE DATABASE "hll_th_forex_test" ENCODING = 'UTF8' TEMPLATE =
"template1"

I am constructing this remotely using a Rake task borrowed from Ruby
on Rails 3.2.9 and employing ActiveRecord-3.2.9.  We use a custom
local on all our servers "en_CA@yyyy-mmm-dd.UTF-8" which differs from
en_CA only in the presentation of the date.  However, regardless of
the system setting, postgresql.conf was configured thus:

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8'        # locale for system error message
                    # strings
lc_monetary = 'en_US.UTF-8'        # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'        # locale for number formatting
lc_time = 'en_US.UTF-8'            # locale for time formatting


So, what is going wrong here?

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3



Re: Problems with a custom LOCALE

From
Tom Lane
Date:
"James B. Byrne" <byrnejb@harte-lyne.ca> writes:
> PG::Error: ERROR:  encoding "UTF8" does not match locale
> "en_CA@yyyy-mmm-dd.UTF-8"
> DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
> : CREATE DATABASE "hll_th_forex_test" ENCODING = 'UTF8' TEMPLATE =
> "template1"

> I am constructing this remotely using a Rake task borrowed from Ruby
> on Rails 3.2.9 and employing ActiveRecord-3.2.9.  We use a custom
> local on all our servers "en_CA@yyyy-mmm-dd.UTF-8" which differs from
> en_CA only in the presentation of the date.

Sorta looks like you based that locale on an ISO 8859-1 locale, not a
UTF8 locale.  AFAICT from looking at the code, PG is seeing that
nl_langinfo(CODESET) returns "ISO-8859-1" or some variant spelling,
so it complains.

            regards, tom lane


Re: Problems with a custom LOCALE

From
"James B. Byrne"
Date:
On Fri, December 14, 2012 17:40, Tom Lane wrote:
> "James B. Byrne" <byrnejb@harte-lyne.ca> writes:
>> PG::Error: ERROR:  encoding "UTF8" does not match locale
>> "en_CA@yyyy-mmm-dd.UTF-8"
>> DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
>> : CREATE DATABASE "hll_th_forex_test" ENCODING = 'UTF8' TEMPLATE =
>> "template1"
>
>
> Sorta looks like you based that locale on an ISO 8859-1 locale, not a
> UTF8 locale.  AFAICT from looking at the code, PG is seeing that
> nl_langinfo(CODESET) returns "ISO-8859-1" or some variant spelling,
> so it complains.
>

This is what I see on the host running postgresql-9.2

# LC_ALL=en_CA@yyyy-mmm-dd.utf8 locale charmap
UTF-8
#

However, on that system, because of PostgreSQL, we have left the
locale unmodified from default provided in CentOS-5.8:
# locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
# locale charmap
UTF-8

Running locale against the base en_CA@yyyy-mmm-dd on the PG host shows
this.

LC_ALL=en_CA@yyyy-mmm-dd locale charmap
ISO-8859-1

On the calling system I see this:

$ locale
LANG=en_CA.UTF-8
LC_CTYPE="en_CA.UTF-8"
LC_NUMERIC="en_CA.UTF-8"
LC_TIME="en_CA.UTF-8"
LC_COLLATE="en_CA.UTF-8"
LC_MONETARY="en_CA.UTF-8"
LC_MESSAGES="en_CA.UTF-8"
LC_PAPER="en_CA.UTF-8"
LC_NAME="en_CA.UTF-8"
LC_ADDRESS="en_CA.UTF-8"
LC_TELEPHONE="en_CA.UTF-8"
LC_MEASUREMENT="en_CA.UTF-8"
LC_IDENTIFICATION="en_CA.UTF-8"
LC_ALL=
$ locale charmap
UTF-8

Previously we were running PG 8.4 on that host and did not encounter
this problem.

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3



Re: Problems with a custom LOCALE

From
Tom Lane
Date:
"James B. Byrne" <byrnejb@harte-lyne.ca> writes:
>>> PG::Error: ERROR:  encoding "UTF8" does not match locale
>>> "en_CA@yyyy-mmm-dd.UTF-8"
>>> DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".

> This is what I see on the host running postgresql-9.2
> # LC_ALL=en_CA@yyyy-mmm-dd.utf8 locale charmap
> UTF-8

> Running locale against the base en_CA@yyyy-mmm-dd on the PG host shows
> this.
> LC_ALL=en_CA@yyyy-mmm-dd locale charmap
> ISO-8859-1

You're showing us three different spellings of the locale name above.
Are you really sure they're all equivalent?

Beyond that, you probably need to find a locale guru.  I see no reason
to think there is anything wrong with the Postgres code for this, and
every reason to think there's something wrong with your locale
definition.  But I don't know enough about custom locales to help you
identify exactly what.

            regards, tom lane


Re: [SOLVED] Problems with a custom LOCALE

From
"James B. Byrne"
Date:
On Sun, December 16, 2012 15:13, Tom Lane wrote:
> "James B. Byrne" <byrnejb@harte-lyne.ca> writes:
>>>> PG::Error: ERROR:  encoding "UTF8" does not match locale
>>>> "en_CA@yyyy-mmm-dd.UTF-8"
>>>> DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
>
>> This is what I see on the host running postgresql-9.2
>> # LC_ALL=en_CA@yyyy-mmm-dd.utf8 locale charmap
>> UTF-8
>
>> Running locale against the base en_CA@yyyy-mmm-dd on the PG host
>> shows
>> this.
>> LC_ALL=en_CA@yyyy-mmm-dd locale charmap
>> ISO-8859-1
>
> You're showing us three different spellings of the locale name above.
> Are you really sure they're all equivalent?
>
> Beyond that, you probably need to find a locale guru.  I see no reason
> to think there is anything wrong with the Postgres code for this, and
> every reason to think there's something wrong with your locale
> definition.  But I don't know enough about custom locales to help you
> identify exactly what.
>
>             regards, tom lane
>

Well, I did eventually track this down and fix it.  It was a number of
issues which all sort of came together to drive me insane.

First, the RDBMS was a remote host.  On CentOS/RHEL ssh is configured
to export the local session's LC environment to remote hosts.
Problem, en_CA@yyyy-mmm-dd did not exist on the remote host.  Running
initdb in a ssh shell with the locale set to a non-existent local
locale probably affected the database structure in some obscure way,
although the postgresql.conf file said 'en_US.UTF-8'.  However,
perhaps the config file is ignored by the init.d script on RHEL in
this respect as well.

Of course, postgresql.conf does not yet exist when one runs initdb.
Hmm.  I will leave that one for the existentialists among us.

In any case, installing the custom locale on the remote host did not
solve the problem because when localedef could not find the UTF-8
character map it used ISO-8859-1 instead and kept that piece of
information to itself because, I did not specify the -v option.  Of
course, if you do specify the -v option and there are any warnings
then the existing character map is not replaced even when it does find
the utf-8 character map. That is, unless you also specify the -c
option.

And finally, you need to archive the base character map (.utf8) and
then create aliases for the rest (.utf-8, UTF8 and UTF-8).  And the
way one does that is to use the self-explanatory --no-archive option
together with the -c option on the aliases but use --replace option on
the localedef call to create the archived map.

Really, localedef is perhaps the most Byzantine and under documented
single function utility that I have run across since I stopped writing
assembler.

Anyway, after rebuilding the remote 9.2 database whilst having the
remote session locale set to en_US.UTF-8, resetting the remote and
local host's locale databases and aliases for en_CA@yyyy-mmm-dd, and
then trying everything once more on the development host I ran into
this error:

PG::Error: ERROR:  new encoding (UTF8) is incompatible with the
encoding of the template database (LATIN1)
HINT:  Use the same encoding as in the template database, or use
template0 as template.
: CREATE DATABASE "hll_th_forex_development" ENCODING = 'UTF-8'

So, I now must specify a template in the database connection, where I
did not need to before. Furthermore, the template I must use is
template0 because regardless of what locale initdb is run in template1
evidently is always created with LATIN1 encoding.  At lease if there
is a way to force template1 to utf8 during initdb I now lack the
energy to discover it.

However, specifying template0 in the db connection does allow the
remote db creation to succeed.

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3



Re: [SOLVED] Problems with a custom LOCALE

From
Tom Lane
Date:
"James B. Byrne" <byrnejb@harte-lyne.ca> writes:
> PG::Error: ERROR:  new encoding (UTF8) is incompatible with the
> encoding of the template database (LATIN1)
> HINT:  Use the same encoding as in the template database, or use
> template0 as template.
> : CREATE DATABASE "hll_th_forex_development" ENCODING = 'UTF-8'

> So, I now must specify a template in the database connection, where I
> did not need to before. Furthermore, the template I must use is
> template0 because regardless of what locale initdb is run in template1
> evidently is always created with LATIN1 encoding.  At lease if there
> is a way to force template1 to utf8 during initdb I now lack the
> energy to discover it.

You could use initdb's -E option, but really it ought to default to the
charmap of the locale that prevails from its environment.  I suspect
there's still something wonky about your custom locale.  (But man,
your description of localedef is scary.)

            regards, tom lane