Re: More message encoding woes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: More message encoding woes
Date
Msg-id 49DB1FBE.3040001@enterprisedb.com
Whole thread Raw
In response to Re: More message encoding woes  (Hiroshi Inoue <inoue@tpf.co.jp>)
Responses Re: More message encoding woes  (Hiroshi Inoue <inoue@tpf.co.jp>)
List pgsql-hackers
Hiroshi Inoue wrote:
> Heikki Linnakangas wrote:
>> I just tried that, and it seems that gettext() does transliteration, 
>> so any characters that have no counterpart in the database encoding 
>> will be replaced with something similar, or question marks. Assuming 
>> that's universal across platforms, and I think it is, using the empty 
>> string should work.
>>
>> It also means that you can use lc_messages='ja' with 
>> server_encoding='latin1', but it will be unreadable because all the 
>> non-ascii characters are replaced with question marks. For something 
>> like lc_messages='es_ES' and server_encoding='koi8-r', it will still 
>> look quite nice.
>>
>> Attached is a patch I've been testing. Seems to work quite well. It
>> would be nice if someone could test it on Windows, which seems to be a 
>> bit special in this regard.
> 
> Unfortunately it doesn't seem to work on Windows.
> 
> First any combination of valid lc_messages and non-existent encoding
> passes the test  strcmp(gettext(""), "") != 0 .

Now that's strange. Can you check what gettext("") returns in that case 
then?

> Second for example the combination of ja(lc_messages) and ISO-8859-1
> passes the the test but the test fails after I changed the last_trans
> lator part of ja message catalog to contain Japanese kanji characters.

Yeah, the inconsistency is not nice. In practice, though, if you try to 
use an encoding that can't represent kanji characters with Japanese, 
you're better off falling back to English than displaying strings full 
of question marks. The same goes for all other languages as well, IMHO. 
If you're going to fall back to English for some translations (and in 
practice "some" is a pretty high percentage) because the encoding is 
missing a character and transliteration is not working, you might as 
well not bother translating at all.

If we add the dummy translations to all .po files, we could force 
fallback-to-English in situations like that by including some or all of 
the non-ASCII characters used in the language in the dummy translation.

I'm thinking of going ahead with this approach, without the dummy 
translation, after we have resolved the first issue on Windows. We can 
add the dummy translations later if needed, but I don't think anyone 
will care.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: More message encoding woes
Next
From: Heikki Linnakangas
Date:
Subject: Re: More message encoding woes