Re: BUG #13785: Postgresql encoding screw-up - Mailing list pgsql-bugs

From Peter J. Holzer
Subject Re: BUG #13785: Postgresql encoding screw-up
Date
Msg-id 20151129205529.GB29654@hjp.at
Whole thread Raw
In response to BUG #13785: Postgresql encoding screw-up  (ntpt@seznam.cz)
List pgsql-bugs
On 2015-11-26 12:15:58 +0000, ntpt@seznam.cz wrote:
> Because there is no character 0x96 in latin2 , transcoder to utf8 does not
> know  the recipe how treat this chracter  - and leave it "as is" producing
> \u0096 character in output .=20

Actually, it does know. While the standard ISO-8859-2[1] only defines
the printable characters, those are commonly combined with the control
characters from ISO 6429, which does define a control code 0x96 (SPA).
The unicode standard also defines two blocks of control characters and
they have the same code points: 0x0D (CR) is translated to U+000D, 0x1B
(ESC) is translated to U+001B and 0x96 (SPA) is translated to U+0096.

So the "problem" here isn't that PostgreSQL doesn't know how to
translate an ISO-8859-2 0x96 into unicode (if that was the case, it
could reject it, forcing the user to fix the client configuration), but
that it does know how to convert it and therefore does it - even if it
is almost certainly wrong (when did you ever need an SPA character?).

    hp

PS: I would write a script which fixes the wrong characters in situ.=20
    That takes a bit of scripting, but:
    * You probably can't guarantee that all your clients are fixed,=20
      so the problem may crop up again
    * So you want to be able to find out when that happens and fix it
      again without taking the db down.
    * A script like that can be expanded to fix other encoding errors,
      too (e.g. UTF-8 double-encoding, ISO-8859-2 vs. ISO-8859-1, ...)

[1] I actually looked at ECMA-94, but they should be identical.

--=20
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants=20
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

pgsql-bugs by date:

Previous
From: Francisco Olarte
Date:
Subject: Re: error al instalar postgresql
Next
From: Stanislav Grozev
Date:
Subject: Incorrect UPDATE trigger invocation in the UPDATE clause of an UPSERT statement.