Home > mailing lists

Re: BUG #13785: Postgresql encoding screw-up - Mailing list pgsql-bugs

From	Feike Steenbergen
Subject	Re: BUG #13785: Postgresql encoding screw-up
Date	November 27, 2015 16:21:44
Msg-id	CAK_s-G21eiMoKdKeBM42Zgr5-LC7mZ14FCPJ9gPxqe0d8kw+hw@mail.gmail.com Whole thread Raw
In response to	BUG #13785: Postgresql encoding screw-up (ntpt@seznam.cz)
List	pgsql-bugs

Tree view

Hi,

> there is a major design flaw or bug

I feel your pain, but how is this a bug? Once the character that cannot be
mapped to latin2 is stored, there is no information about the
source-encoding
(win1250) of this character available anymore. Any client connecting
(whether your application or pg_dump) will get that character "as is".

I don't see a way around solving this in general, other than rejecting
characters that do not fit in the target character set

> where client use multiple encodings that have more characters then
database
> encoding, the database is screwed forever

The allowed conversions from LATIN2 to other encodings is quite
limited (MULE_INTERNAL, UTF8, WIN1250), , see:
see: http://www.postgresql.org/docs/9.4/static/multibyte.html#AEN35768:

If the clients using different encodings all touch the same data, the data
is already dirty. The migration is only bringing it to light then.

If the clients all touch different parts of the data, the data can be
safely migrated by exporting distinct parts of data in its correct encoding
and then importing it with that encoding in the the target database with
UTF8 encoding.

> I thik that safe practice would be: Pg_dum with -E as used by client
> applicaton  and then restore to newly created utf8 database . It should
 be
> mentioned as safe way in the doc, at least

This looks safe to me, you export unknown characters data into its original
encoding thereby making them known again. If you now import this into UTF8
it
will be encoded correctly, because both the source (WIN1250) as the target
(UTF8) can encode these character.

regards,

Feike Steenbergen

pgsql-bugs by date:

From: Mark Kirkwood
Date: 27 November 2015, 00:33:53
Subject: Re: Recovery conflict message lost in user session for 9.3

From: txie@incognito.com
Date: 28 November 2015, 00:16:10
Subject: BUG #13786: ODBC driver doesn't work to connect to database

Re: BUG #13785: Postgresql encoding screw-up - Mailing list pgsql-bugs

Previous

Next