Thread: Problem with error messages
We already wrote in pgsql-bugs (#11550), but there it was recommended to transfer this topic to the translator team.
Now the problem:
If we set client_encoding to Latin9 (as we are here in Germany), we get as nearly every error message from PostgreSQL: character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
Why:
equivalent in LATIN9
In 9.x we have new characters for delimiting words.
An example:
"Drop table if exists mickeymouse;"
delivers in PG-8.4
HINWEIS: Tabelle „mickeymouse“ existiert nicht, wird übersprungen
but delivers in PG-9.3
HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen
If we set client_encoding to Latin9 (as we are here in Germany), we get an error message from PostgreSQL:
character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
equivalent in LATIN9
but we do not see the real error message: "Table ... does not exist"
So a proposal: Please change these delimiters to something which can be compiled to any foreign language, as " or similar
Regards
Walter
--
Viele Grüße
Walter Willmertinger
I finally had a moment to think this through. The problem is that if an error message is not representable in the client encoding, all you get as a client are encoding errors. This can happen in any translation (or untranslated, except those generally only have ASCII characters). In this particular case, the German translated error messages contain Unicode characters not in LATIN1/LATIN9. This might have been a bad choice in retrospect, and can be fixed. But this same thing might also happen if you happen to connect to a, say, a database with locale ru_RU.utf8 while your client encoding is LATIN9. You will not be able to get any error message other than an encoding error. (Ironically, the backend will first try to send the encoding error in translated form, which will again fail, and finally it will send it in English.) I think the recovery path should be changed so that it sends the original error message in untranslated form, possibly preceded by a notice that encoding conversion failed. Comments? On 6/30/15 9:31 AM, Walter Willmertinger wrote: > We already wrote in pgsql-bugs (#11550), but there it was recommended to > transfer this topic to the translator team. > > Now the problem: > If we set client_encoding to Latin9 (as we are here in Germany), we get > as nearly every error message from PostgreSQL: character with byte > sequence 0xe2 0x80 0x9e in encoding UTF8 has no > equivalent in LATIN9 > > Why: > In 9.x we have new characters for delimiting words. > > An example: > "Drop table if exists mickeymouse;" > delivers in PG-8.4 > > HINWEIS: Tabelle „mickeymouse“ existiert nicht, wird übersprungen > > but delivers in PG-9.3 > > HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen > > If we set client_encoding to Latin9 (as we are here in Germany), we get > an error message from PostgreSQL: > character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no > equivalent in LATIN9 > > but we do not see the real error message: "Table ... does not exist" > > So a proposal: Please change these delimiters to something which can be > compiled to any foreign language, as " or similar > > Regards > Walter > > > -- > > Viele Grüße > > Walter Willmertinger > -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello,
I think, we had two issues here:
1. We have UTF-encoded translation to some language, which could contain extended (non-letter) characters such as », — , …
and we have localized clients expecting to read the server messages in their language using non-UTF-encoding.I think, we had two issues here:
1. We have UTF-encoded translation to some language, which could contain extended (non-letter) characters such as », — , …
I think we shouldn't leave them with the untranslated messages, instead we just should TRANSLIT option of iconv.
For example:
>echo 'HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen' | iconv -f UTF-8 -t ASCII HINWEIS: Tabelle iconv: illegal input sequence at position 17But
>echo 'HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen' | iconv -f UTF-8 -t ASCII//TRANSLIT HINWEIS: Tabelle >>mickeymouse<< existiert nicht, wird ubersprungenIs much better and still better than to show untranslated message with the notice about failed conversion.
2. Language of the localization could be not compatible with the client encoding at all. For example, we can't convert Russian to LATIN1 with TRANSLIT. In that case I would just print the original error message.
Best regards,
Alexander
20.07.2016 03:58, Peter Eisentraut пишет:
I finally had a moment to think this through. The problem is that if an error message is not representable in the client encoding, all you get as a client are encoding errors. This can happen in any translation (or untranslated, except those generally only have ASCII characters). In this particular case, the German translated error messages contain Unicode characters not in LATIN1/LATIN9. This might have been a bad choice in retrospect, and can be fixed. But this same thing might also happen if you happen to connect to a, say, a database with locale ru_RU.utf8 while your client encoding is LATIN9. You will not be able to get any error message other than an encoding error. (Ironically, the backend will first try to send the encoding error in translated form, which will again fail, and finally it will send it in English.) I think the recovery path should be changed so that it sends the original error message in untranslated form, possibly preceded by a notice that encoding conversion failed. Comments? On 6/30/15 9:31 AM, Walter Willmertinger wrote:We already wrote in pgsql-bugs (#11550), but there it was recommended to transfer this topic to the translator team. Now the problem: If we set client_encoding to Latin9 (as we are here in Germany), we get as nearly every error message from PostgreSQL: character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no equivalent in LATIN9 Why: In 9.x we have new characters for delimiting words. An example: "Drop table if exists mickeymouse;" delivers in PG-8.4 HINWEIS: Tabelle „mickeymouse“ existiert nicht, wird übersprungen but delivers in PG-9.3 HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen If we set client_encoding to Latin9 (as we are here in Germany), we get an error message from PostgreSQL: character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no equivalent in LATIN9 but we do not see the real error message: "Table ... does not exist" So a proposal: Please change these delimiters to something which can be compiled to any foreign language, as " or similar Regards Walter -- Viele Grüße Walter Willmertinger
On 7/20/16 2:34 AM, Alexander Law wrote: > I think we shouldn't leave them with the untranslated messages, instead > we just should TRANSLIT option of iconv. > For example: > >>echo 'HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen' | iconv -f UTF-8 -t ASCII > HINWEIS: Tabelle iconv: illegal input sequence at position 17 > > But > >>echo 'HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen' | iconv -f UTF-8 -t ASCII//TRANSLIT > HINWEIS: Tabelle >>mickeymouse<< existiert nicht, wird ubersprungen > > Is much better and still better than to show untranslated message with > the notice about failed conversion. Right, but we don't have that functionality in PostgreSQL. It would have to be implemented, and the transliteration tables provided. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
21.07.2016 02:45, Peter Eisentraut wrote: > On 7/20/16 2:34 AM, Alexander Law wrote: >> echo 'HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen' | iconv -f UTF-8 -t ASCII//TRANSLIT >> HINWEIS: Tabelle >>mickeymouse<< existiert nicht, wird ubersprungen >> >> Is much better and still better than to show untranslated message with >> the notice about failed conversion. > Right, but we don't have that functionality in PostgreSQL. It would > have to be implemented, and the transliteration tables provided. > It seems, that the gettext functionality could be used for that conversion. I wrote a simple test: bindtextdomain("postgres-9.5","/usr/share/locale"); bind_textdomain_codeset("postgres-9.5", "ASCII"); textdomain("postgres-9.5"); printf(gettext("table \"%s\" does not exist, skipping"), "missing"); It prints: >LANG=de_DE.UTF-8 ./testgettext Tabelle ,,missing" existiert nicht, wird uebersprungenuser Or with LATIN9: LANG=de_DE.UTF-8 ./testgettext | iconv -f LATIN9 Tabelle »missing« existiert nicht, wird übersprungenuser So if we know what encoding to use when translating the server messages, may be we should just specify it in bind_textdomain_codeset. I see that the server itself can have some different encoding for the logs (which is?), but may be it could be converted such way too. Best regards, Alexander
On 7/21/16 1:59 AM, Alexander Law wrote: >> Right, but we don't have that functionality in PostgreSQL. It would >> > have to be implemented, and the transliteration tables provided. >> > > It seems, that the gettext functionality could be used for that conversion. Well, that would be quite a change to have gettext do the conversion when everything else in protocol messages goes through the built-in conversion tables. The possible implications of that are not clear. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services