Home > mailing lists

Re: \COPY to accept non UTF-8 chars in CHAR columns - Mailing list pgsql-general

From	Andrew Gierth
Subject	Re: \COPY to accept non UTF-8 chars in CHAR columns
Date	March 28, 2020 17:08:47
Msg-id	87bloga8ce.fsf@news-spur.riddles.org.uk Whole thread Raw
In response to	Re: \COPY to accept non UTF-8 chars in CHAR columns (Matthias Apitz <guru@unixarea.de>)
List	pgsql-general

Tree view

>>>>> "Matthias" == Matthias Apitz <guru@unixarea.de> writes:

 Matthias>   i.e. 0xc3 is translated to 0xc383 and the 2nd half, the
 Matthias>   0xbc to 0xc2bc, both translations have nothing to do with
 Matthias>   the original split 0xc3bc, and perhaps in this case it
 Matthias>   would be better to spill out a blank 0x40 for each of the
 Matthias>   bytes which formed the 0xc3bc.

If the only malformed sequences are there as a result of splitting up
valid sequences, then you could do something like convert all invalid
sequences to (sequences of) noncharacters, then once the data is
imported, fix it up by adjusting how the data is split and regenerating
the correct sequence (assuming your application allows this).

For example you could encode an arbitrary byte xy as a sequence of two
codepoints U+FDDx U+FDEy (the range FDD0-FDEF are all defined as
noncharacters).

-- 
Andrew (irc:RhodiumToad)

pgsql-general by date:

From: Lucas Possamai
Date: 28 March 2020, 15:28:52
Subject: PostegreSQL 9.2 to 9.6

From: "Andrus"
Date: 28 March 2020, 18:18:50
Subject: Postgres 12 backup in 32 bit windows client

Re: \COPY to accept non UTF-8 chars in CHAR columns - Mailing list pgsql-general

Previous

Next