Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use. - Mailing list pgsql-hackers

From Hiroshi Inoue
Subject Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.
Date
Msg-id 493733EE.7000503@tpf.co.jp
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
Magnus Hagander wrote:
> Hiroshi Inoue wrote:
>>> I think the thing us that as long as the encodings are compatible
>>> (latin1 with different names for example) it worked  fine.
>>>
>>>>  In any case I think the problem is that gettext is
>>>> looking at a setting that is not what we are looking at.  Particularly
>>>> with the 8.4 changes to allow per-database locale settings, this has
>>>> got to be fixed in a bulletproof way.
>> Attached is a new patch to apply bind_textdomain_codeset() to most
>> server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
>> and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.
>>
>> Unfortunately it's hard for Saito-san and me to check encodings
>> other than EUC-JP.
>
> In principle this looks good, I think,  but I'm a bit worried around the
> lack of testing.

Thanks and I agree with you.

 > I can do some testing under LATIN1 which is what we use
> in Sweden (just need to get gettext working *at all* in my dev
> environment again - I've somehow managed to break it), and perhaps we
> can find someone to do a test in an eastern-european locale to get some
> more datapoints?
>
> Can you outline the steps one needs to go through to show the problem,
> so we can confirm it's fixed in these locales?

Saito-san and I have been working on another related problem about
changing LC_MESSAGES locale properly under Windows and would be able
to provide a patch in a few days. It seems preferable for us to apply
the patch also so as to change the message catalog easily.

Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and
the server encoding is EUC-TW. You can see it as a UTF-8 text
because the client_encoding is set to UTF-8 first.

BTW you can see another problem at line 4 in the text.
At the point the LC_MESSAGES is still japanese and postgres fails
to convert a Japanese error message to EUC_TW encoding. There's
no wonder but it doesn't seem preferable.

regards,
Hiroshi Inoue
set client_encoding to utf_8;
SET
1;
psql:cmd/euctw.sql:2: ERROR:  character 0xb9e6 of encoding "EUC_TW" has no equivalent in "UTF8"
select current_database();
 current_database
------------------
 euctw
(1 �s)

show server_encoding;
 server_encoding
-----------------
 EUC_TW
(1 �s)

show lc_messages;
    lc_messages
--------------------
 Japanese_Japan.932
(1 �s)

set lc_messages to cht;
SET
select a;
psql:cmd/euctw.sql:7: 錯誤:  欄位"a"不存在
LINE 1: select a;
               ^
1;
psql:cmd/euctw.sql:8: 錯誤:  在"語法錯誤"附近發生 1
LINE 1: 1;
        ^
select * from a;
psql:cmd/euctw.sql:9: 錯誤:  relation "a"不存在
LINE 1: select * from a;
                      ^

pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Simple postgresql.conf wizard
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Simple postgresql.conf wizard