8.3.0, locales, and encodings - Mailing list pgsql-admin

From Dean Gibson (DB Administrator)
Subject 8.3.0, locales, and encodings
Date
Msg-id 47AA6CEE.4010406@ultimeth.com
Whole thread Raw
Responses Re: 8.3.0, locales, and encodings  (Tino Schwarze <postgresql@tisc.de>)
List pgsql-admin
I got a small surprise when upgrading to 8.3.0 today (CentOS 4.4).  When
I went to restore my database, PSQL complained that my "CREATE
DATABASE... ENCODING 'UTF8';" wasn't valid (the new phpBB 3.0 software
enforces using UTF8).  Sure enough, a bit of research showed that the
supplied /etc/init.d/postgresql file had no encoding specified, which
meant that it defaulted to my operating system's 'en_US.ISO-8859-1'
encoding.  A quick reading of the PostgreSQL documentation revealed that
PostgreSQL can't really use an encoding different from that specified
during INITDB (caveat below), and apparently that is now enforced in v8.3.

OK, so I manually run INITDB w/ "-E UTF8", and now I can create/load one
of my databases.  Unfortunately, when I go to create another database
(where the data is 'ISO-8859-1'), now IT won't create.  A more detailed
reading of the PostgreSQL documentation revealed that PostgreSQL can't
use an encoding different from that specified during INITDB, UNLESS
INITDB is run w/ "-locale POSIX".

Ok, I rerun INITDB w/ "-locale POSIX  -E UTF8", and now both databases
create/load, apparently properly.  I also did an INITDB on another
database server, running INITDB w/ just "-locale POSIX" (apparently
meaning that the "template1" encoding defaulted to "SQL_ASCII"), and
successfully create/reload another database there (explicitly specifying
an ENCODING there).

Questions/suggestions:

1. The provided /etc/init.d/postgresql file seems to be part of this
trap.  Should it be changed to default the encoding to POSIX?
2. I want my installation to have the maximum flexibility.  Should I be
running INITDB w/ "-locale POSIX  -E UTF8", or just "-locale POSIX"
(implying "-E SQL_ASCII")?  Or does it make a difference, since I always
specify the desired database encoding when I "CREATE DATABASE...
ENCODING '...';"?
3. One of my databases is ISO-8859-1, because that's the format of the
data I get from the US gov't.  However, should I instead use the 'UTF8'
encoding for that database, and then use a client (PSQL) encoding of
"LATIN1", and let PostgreSQL convert the data?  Pros & cons?

Performance is a concern, but disk space is not.

I presume this ("admin") is the proper PostgreSQL list to post this to.
If not, let me know.

-- Dean Gibson

pgsql-admin by date:

Previous
From: antonio freitas
Date:
Subject: password management
Next
From: "Mag Gam"
Date:
Subject: Re: rename database