Thread: BUG #11550: Error messages contain not encodable characters (Latin9)

BUG #11550: Error messages contain not encodable characters (Latin9)

From
willmis@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      11550
Logged by:          Walter W.
Email address:      willmis@gmail.com
PostgreSQL version: 9.3.5
Operating system:   Linux/Windows
Description:

In 9.3 we have new characters for delimiting words.

An example:
"Drop table if exists mickeymouse;"
delivers in PG-9.3

HINWEIS: Tabelle „mickeymouse“ existiert nicht, wird übersprungen

but delivers in PG-8.4

HINWEIS: Tabelle »mickeymouse« existiert nicht, wird übersprungen

If we set client_encoding to Latin9 (as we are here in Germany), we get an
error message from PostgreSQL:

character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
equivalent in LATIN9

This behaviour leads to problems in tools we use like Zeos etc.

Is there a way to change the delimiters in messages by some way?
This would help us a lot.

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Bruce Momjian
Date:
On Wed, Oct  1, 2014 at 08:09:23PM +0000, willmis@gmail.com wrote:
> If we set client_encoding to Latin9 (as we are here in Germany), we get an
> error message from PostgreSQL:
>
> character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
> equivalent in LATIN9
>
> This behaviour leads to problems in tools we use like Zeos etc.
>
> Is there a way to change the delimiters in messages by some way?

No.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Heikki Linnakangas
Date:
On 10/03/2014 03:15 AM, Bruce Momjian wrote:
> On Wed, Oct  1, 2014 at 08:09:23PM +0000, willmis@gmail.com wrote:
>> If we set client_encoding to Latin9 (as we are here in Germany), we get an
>> error message from PostgreSQL:
>>
>> character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
>> equivalent in LATIN9
>>
>> This behaviour leads to problems in tools we use like Zeos etc.
>>
>> Is there a way to change the delimiters in messages by some way?
>
> No.

Well, you could manually search & replace the .po files and run msgfmt
on them. But no, there's no easy way out.

Can we fix this? It's annoying that you can't use LATIN9 with a German
locale, as that is the most common encoding used with German, aside from
UTF-8.

In general, it's annoying that you get errors like that if you use an
encoding that can't represent all the characters in error messages. In
situations like this, it would be clearly better to transliterate the
quotation marks to " or «». Also with umlauts (äöü), it would be better
to transliterate them to ae, oe, ue, than to throw an error.

Gettext does perform translitteration, but the problem is that we first
convert the error message to the server encoding, using gettext, and
then convert from server encoding to the client encoding ourselves. It
would make more sense to let gettext convert directly to the client
encoding. We currently construct all the messages in server encoding and
convert to client encoding just before sending to the client, so
changing that would require some care to keep track which messages are
already in client encoding and which ones need conversion. But if
someone is willing to put some effort to it, it seems doable.

- Heikki

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Walter Willmertinger
Date:
I think, the most easy way would be to change in a next release the error
messages containing only standard ansi characters, so just " and ' should
be used.
If someone uses an "umlaut" in his field or table name, he/she can do this,
because he/she created the table or field name with his own client encoding=
.

In PG 8 versions we never had this problem, as no delimiting "bad"
characters where used in error messages. They were introduced in some PG 9
version and since them we have a lot of problems.

For example if you have a complicated sql script using Latin9
client_encoding and all output you get is
    "character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
equivalent in LATIN9",
you have to edit the script and do not see the real error. More problem
arise, if you have a Delphi application and get this error message.

- Walter

2014-10-03 9:21 GMT+02:00 Heikki Linnakangas <hlinnakangas@vmware.com>:

> On 10/03/2014 03:15 AM, Bruce Momjian wrote:
>
>> On Wed, Oct  1, 2014 at 08:09:23PM +0000, willmis@gmail.com wrote:
>>
>>> If we set client_encoding to Latin9 (as we are here in Germany), we get
>>> an
>>> error message from PostgreSQL:
>>>
>>> character with byte sequence 0xe2 0x80 0x9e in encoding UTF8 has no
>>> equivalent in LATIN9
>>>
>>> This behaviour leads to problems in tools we use like Zeos etc.
>>>
>>> Is there a way to change the delimiters in messages by some way?
>>>
>>
>> No.
>>
>
> Well, you could manually search & replace the .po files and run msgfmt on
> them. But no, there's no easy way out.
>
> Can we fix this? It's annoying that you can't use LATIN9 with a German
> locale, as that is the most common encoding used with German, aside from
> UTF-8.
>
> In general, it's annoying that you get errors like that if you use an
> encoding that can't represent all the characters in error messages. In
> situations like this, it would be clearly better to transliterate the
> quotation marks to " or =C2=AB=C2=BB. Also with umlauts (=C3=A4=C3=B6=C3=
=BC), it would be better to
> transliterate them to ae, oe, ue, than to throw an error.
>
> Gettext does perform translitteration, but the problem is that we first
> convert the error message to the server encoding, using gettext, and then
> convert from server encoding to the client encoding ourselves. It would
> make more sense to let gettext convert directly to the client encoding. W=
e
> currently construct all the messages in server encoding and convert to
> client encoding just before sending to the client, so changing that would
> require some care to keep track which messages are already in client
> encoding and which ones need conversion. But if someone is willing to put
> some effort to it, it seems doable.
>
> - Heikki
>

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> Can we fix this? It's annoying that you can't use LATIN9 with a German
> locale, as that is the most common encoding used with German, aside from
> UTF-8.

I would think that would be a matter to be taken up with the translation
people.  If a particular set of translated messages is using quote
characters that don't exist in every encoding commonly used with that
language, then it's arguable that the translator made a poor choice
of quote characters.  (Likewise for any other special characters of
course.)

            regards, tom lane

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Walter Willmertinger
Date:
I think, this would be great. As otherwise, PG 9 is very hard to use here
in West Europe.
Also the way to change .po files seems to be a lot of work.

We just use the windows binaries and there are no .po files to find.

We hope for some change in the near future and could help (in reviews or
..) if necessary.

Regards

Walter

2014-10-03 15:42 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> > Can we fix this? It's annoying that you can't use LATIN9 with a German
> > locale, as that is the most common encoding used with German, aside from
> > UTF-8.
>
> I would think that would be a matter to be taken up with the translation
> people.  If a particular set of translated messages is using quote
> characters that don't exist in every encoding commonly used with that
> language, then it's arguable that the translator made a poor choice
> of quote characters.  (Likewise for any other special characters of
> course.)
>
>                         regards, tom lane
>

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Walter Willmertinger
Date:
It's a pity, but these problems were not corrected until now (we just tried
9.4.4). The error messages still contain the same unprintable (in Latin9)
characters.
Please translation team, could you look for this (here in Germany very
heavy) problem?

By the way, we found a way to help us:
We just rename the language directory  share/locale/de to
share/locale/de-nix, so Postgresql cannot find the correct language files
and we get english error messages.


Regards

Walter Willmertinger

Re: BUG #11550: Error messages contain not encodable characters (Latin9)

From
Peter Eisentraut
Date:
On 6/24/15 10:09 AM, Walter Willmertinger wrote:
> It's a pity, but these problems were not corrected until now (we just
> tried 9.4.4). The error messages still contain the same unprintable (in
> Latin9) characters.

This will be fixed (for German) in the next minor releases.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services