Thread: upgrading to 8.3, utf-8 and latin2 locale problem

upgrading to 8.3, utf-8 and latin2 locale problem

From
Mage
Date:
          Hello,

I am sure this won't be the first e-mail about this issue, however we
are upgrading production-like environment. Please help.

For reproducing I've used two debian servers, same locales (en_US.UTF-8,
en_US ISO-8859-1, hu_HU.UTF-8, hu_HU ISO-8859-2), Debian testing.

------------------------------------------------
Postgresql 8.2 (8.2.6-2):

/usr/lib/postgresql/8.2/bin/initdb -D /home/readonly/pg_data/
--locale='en_US.UTF-8' --lc-collate='hu_HU.UTF-8'
--lc-ctype='hu_HU.UTF-8' --lc-time='hu_HU.UTF-8'
The files belonging to this database system will be owned by user "mage".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  hu_HU.UTF-8
  CTYPE:    hu_HU.UTF-8
  MESSAGES: en_US.UTF-8
  MONETARY: en_US.UTF-8
  NUMERIC:  en_US.UTF-8
  TIME:     hu_HU.UTF-8
The default database encoding has accordingly been set to UTF8.


/usr/lib/postgresql/8.2/bin/pg_ctl -D /home/readonly/pg_data -l logfile
-o '-p 5555' start
/usr/lib/postgresql/8.2/bin/psql -p 5555 template1


# create database test encoding = 'latin2';
CREATE DATABASE

------------------------------------------------
Postgresql 8.3 (8.3.0-1):

/usr/lib/postgresql/8.3/bin/initdb -D /home/readonly/pg_data/
--locale='en_US.UTF-8' --lc-collate='hu_HU.UTF-8'
--lc-ctype='hu_HU.UTF-8' --lc-time='hu_HU.UTF-8'
The files belonging to this database system will be owned by user "mage".
This user must also own the server process.

The database cluster will be initialized with locales
  COLLATE:  hu_HU.UTF-8
  CTYPE:    hu_HU.UTF-8
  MESSAGES: en_US.UTF-8
  MONETARY: en_US.UTF-8
  NUMERIC:  en_US.UTF-8
  TIME:     hu_HU.UTF-8
The default database encoding has accordingly been set to UTF8.
The default text search configuration will be set to "hungarian".

/usr/lib/postgresql/8.3/bin/pg_ctl -D /home/readonly/pg_data -l logfile
-o '-p 5555' start
/usr/lib/postgresql/8.3/bin/psql -p 5555 template1

template1=# create database test encoding = 'latin2';
ERROR:  encoding LATIN2 does not match server's locale hu_HU.UTF-8
DETAIL:  The server's LC_CTYPE setting requires encoding UTF8.

In Google we've found similar err messages for pg_upgradecluster.

----------------

Both server:
show all;
client_encoding                 | UTF8
 lc_collate                      | hu_HU.UTF-8
 lc_ctype                        | hu_HU.UTF-8
 lc_messages                     | en_US.UTF-8
 lc_monetary                     | en_US.UTF-8
 lc_numeric                      | en_US.UTF-8
 lc_time                         | hu_HU.UTF-8
server_encoding                 | UTF8

We would like to upgrade from 8.1 to 8.3. We have UTF-8 and LATIN2
databases. Any idea?


       Mage



Re: upgrading to 8.3, utf-8 and latin2 locale problem

From
Tom Lane
Date:
Mage <mage@mage.hu> writes:
> We would like to upgrade from 8.1 to 8.3. We have UTF-8 and LATIN2
> databases. Any idea?

If you were running with a non-C database locale, that was always
broken in 8.1, and you are very fortunate not to have stumbled across
any of the failure cases.

You can either standardize on UTF8 for all your databases (note that
this does not stop your *clients* from using LATIN2 if they want),
or use C locale which will work equally poorly with all encodings ;-)

            regards, tom lane

Re: upgrading to 8.3, utf-8 and latin2 locale problem

From
Mage
Date:
Tom Lane wrote:
> Mage <mage@mage.hu> writes:
>
>> We would like to upgrade from 8.1 to 8.3. We have UTF-8 and LATIN2
>> databases. Any idea?
>>
>
> If you were running with a non-C database locale, that was always
> broken in 8.1, and you are very fortunate not to have stumbled across
> any of the failure cases.
>
> You can either standardize on UTF8 for all your databases (note that
> this does not stop your *clients* from using LATIN2 if they want),
> or use C locale which will work equally poorly with all encodings ;-)
>

If it were up to me, I'd never use LATIN2. I switched to unicode years ago.
Some of our databases don't belong to me and I can't modify their clients.

What is the proper use of "create database xxxx encoding = 'yyy'" in
postgresql 8.3? If I understand You, I should avoid it totally, and
convert every affected database dumps to UTF-8, load them and use "alter
database xxx set client_encoding = 'latin2'". Is it right?

       Mage





Re: upgrading to 8.3, utf-8 and latin2 locale problem

From
Tom Lane
Date:
Mage <mage@mage.hu> writes:
> What is the proper use of "create database xxxx encoding = 'yyy'" in
> postgresql 8.3?

If you're not using C locale, it has no use whatsoever.

> If I understand You, I should avoid it totally, and
> convert every affected database dumps to UTF-8, load them and use "alter
> database xxx set client_encoding = 'latin2'". Is it right?

Yes, that's what I'd suggest.

            regards, tom lane