Hi,
> there is a major design flaw or bug
I feel your pain, but how is this a bug? Once the character that cannot be
mapped to latin2 is stored, there is no information about the
source-encoding
(win1250) of this character available anymore. Any client connecting
(whether your application or pg_dump) will get that character "as is".
I don't see a way around solving this in general, other than rejecting
characters that do not fit in the target character set
> where client use multiple encodings that have more characters then
database
> encoding, the database is screwed forever
The allowed conversions from LATIN2 to other encodings is quite
limited (MULE_INTERNAL, UTF8, WIN1250), , see:
see: http://www.postgresql.org/docs/9.4/static/multibyte.html#AEN35768:
If the clients using different encodings all touch the same data, the data
is already dirty. The migration is only bringing it to light then.
If the clients all touch different parts of the data, the data can be
safely migrated by exporting distinct parts of data in its correct encoding
and then importing it with that encoding in the the target database with
UTF8 encoding.
> I thik that safe practice would be: Pg_dum with -E as used by client
> applicaton and then restore to newly created utf8 database . It should
be
> mentioned as safe way in the doc, at least
This looks safe to me, you export unknown characters data into its original
encoding thereby making them known again. If you now import this into UTF8
it
will be encoded correctly, because both the source (WIN1250) as the target
(UTF8) can encode these character.
regards,
Feike Steenbergen