More message encoding woes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject More message encoding woes
Date
Msg-id 49D0C095.8000304@enterprisedb.com
Whole thread Raw
Responses Re: More message encoding woes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: More message encoding woes  (Peter Eisentraut <peter_e@gmx.net>)
Re: More message encoding woes  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
latin1db=# SELECT version();                                      version 

----------------------------------------------------------------------------------- PostgreSQL 8.3.7 on
i686-pc-linux-gnu,compiled by GCC gcc (Debian 
 
4.3.3-5) 4.3.3
(1 row)

latin1db=# SELECT name, setting FROM pg_settings where name like 'lc%' 
OR name like '%encoding';      name       | setting
-----------------+--------- client_encoding | utf8 lc_collate      | C lc_ctype        | C lc_messages     | es_ES
lc_monetary    | C lc_numeric      | C lc_time         | C server_encoding | LATIN1
 
(8 rows)

latin1db=# SELECT * FROM foo;
ERROR:  no existe la relación «foo»

The accented characters are garbled. When I try the same with a database 
that's in UTF8 in the same cluster, it works:

utf8db=# SELECT name, setting FROM pg_settings where name like 'lc%' OR 
name like '%encoding';      name       | setting
-----------------+--------- client_encoding | UTF8 lc_collate      | C lc_ctype        | C lc_messages     | es_ES
lc_monetary    | C lc_numeric      | C lc_time         | C server_encoding | UTF8
 
(8 rows)

utf8db=# SELECT * FROM foo;
ERROR:  no existe la relación «foo»

What is happening is that gettext() returns the message in the encoding 
determined by LC_CTYPE, while we expect it to return it in the database 
encoding. Starting with PG 8.3 we enforce that the encoding specified in 
LC_CTYPE matches the database encoding, but not for the C locale.

In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() 
which fixes that, but we only do it on Windows. In earlier versions we 
called it on all platforms, but only for UTF-8. It seems that we should 
call bind_textdomain_codeset on all platforms and all encodings. 
However, there seems to be a reason why we only do it for Windows on CVS 
HEAD: we need a mapping from our encoding ID to the OS codeset name, and 
the OS codeset names vary.

How can we make this more robust?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: fix - function call with variadic parameter for type "any"
Next
From: Heikki Linnakangas
Date:
Subject: Re: 8.3.5: Crash in CountActiveBackends() - lockless race?