Re: BUG #13785: Postgresql encoding screw-up - Mailing list pgsql-bugs

From Feike Steenbergen
Subject Re: BUG #13785: Postgresql encoding screw-up
Date
Msg-id CAK_s-G21eiMoKdKeBM42Zgr5-LC7mZ14FCPJ9gPxqe0d8kw+hw@mail.gmail.com
Whole thread Raw
In response to BUG #13785: Postgresql encoding screw-up  (ntpt@seznam.cz)
List pgsql-bugs
Hi,

> there is a major design flaw or bug

I feel your pain, but how is this a bug? Once the character that cannot be
mapped to latin2 is stored, there is no information about the
source-encoding
(win1250) of this character available anymore. Any client connecting
(whether your application or pg_dump) will get that character "as is".

I don't see a way around solving this in general, other than rejecting
characters that do not fit in the target character set

> where client use multiple encodings that have more characters then
database
> encoding, the database is screwed forever

The allowed conversions from LATIN2 to other encodings is quite
limited (MULE_INTERNAL, UTF8, WIN1250), , see:
see: http://www.postgresql.org/docs/9.4/static/multibyte.html#AEN35768:

If the clients using different encodings all touch the same data, the data
is already dirty. The migration is only bringing it to light then.

If the clients all touch different parts of the data, the data can be
safely migrated by exporting distinct parts of data in its correct encoding
and then importing it with that encoding in the the target database with
UTF8 encoding.

> I thik that safe practice would be: Pg_dum with -E as used by client
> applicaton  and then restore to newly created utf8 database . It should
 be
> mentioned as safe way in the doc, at least

This looks safe to me, you export unknown characters data into its original
encoding thereby making them known again. If you now import this into UTF8
it
will be encoded correctly, because both the source (WIN1250) as the target
(UTF8) can encode these character.

regards,

Feike Steenbergen

pgsql-bugs by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Recovery conflict message lost in user session for 9.3
Next
From: txie@incognito.com
Date:
Subject: BUG #13786: ODBC driver doesn't work to connect to database