Encoding conversion: Use replacement character instead of failingquery with "ERROR: character with byte sequence 0xd0 0xad in encoding "UTF8"has no equivalent in encoding LATIN1" ? - Mailing list pgsql-general

From Christian Ramseyer
Subject Encoding conversion: Use replacement character instead of failingquery with "ERROR: character with byte sequence 0xd0 0xad in encoding "UTF8"has no equivalent in encoding LATIN1" ?
Date
Msg-id 3c30a47a-09c8-8b36-3ebb-6336c0f089d6@networkz.ch
Whole thread Raw
Responses Re: Encoding conversion: Use replacement character instead of failing query with "ERROR: character with byte sequence 0xd0 0xad in encoding "UTF8" has no equivalent in encoding LATIN1" ?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hello list

I'm slowly converting all of our databases from Postgres 9 and LATIN1 to
Postgres 12 and UTF8, one by one.

I was wondering if there is a solution for this issue: if a database
that is still latin1 has a postgres_fdw foreign table onto a converted
utf8 database, and somehow a character that is not in latin1 has already
gotten in there, a query might fail with e.g.

PG9=# select * from fdw_table_test where hostname ~* 'moscow-ix-02';
ERROR:  character with byte sequence 0xd0 0xad in encoding "UTF8" has no
equivalent in encoding "LATIN1"
-- (0xd0 0xad being CYRILLIC CAPITAL LETTER E: Э)

Can I somehow influence the client:UTF8->server:LATIN1 character set
conversion so that instead of failing, it inserts an invalid codepoint
character, the utf8 hex bytes as string, drops the character or
something like that?

I do agree that the default behavior is correct and in general failing
is a lot better than mutilating or losing data in the conversion.
However in this specific case we don't care all that much about the bits
in possibly foreign scripts, and just having some ? in the string would
be easier to handle than one Cyrillic character in a single row failing
a large import job completely.

Is there any way I can do this on the conversion level* ?

Cheers
Christian


* I get that I could somehow write views too look at the content first
and filter out characters that won't work in LATIN1 before going through
the fdw, but I don't quite know all the tables and columns where this
can become an issue. But if you have a copy/paste ready solution for
that I'll take it as well of course :)



pgsql-general by date:

Previous
From: "Wolff, Ken L"
Date:
Subject: Re: Lock Postgres account after X number of failed logins?
Next
From: Tim Cross
Date:
Subject: Re: Lock Postgres account after X number of failed logins?