Home > mailing lists

Re: \COPY to accept non UTF-8 chars in CHAR columns - Mailing list pgsql-general

From	Thomas Munro
Subject	Re: \COPY to accept non UTF-8 chars in CHAR columns
Date	March 27, 2020 20:40:30
Msg-id	CA+hUKG+BAtBCXaB-0SYxNiVV3_CAbHvm7sm1PJWyhJFvTi_R3A@mail.gmail.com Whole thread
In response to	Re: \COPY to accept non UTF-8 chars in CHAR columns (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: \COPY to accept non UTF-8 chars in CHAR columns Re: \COPY to accept non UTF-8 chars in CHAR columns
List	pgsql-general

Tree view

On Sat, Mar 28, 2020 at 4:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Matthias Apitz <guru@unixarea.de> writes:
> > In short, it there a way to let \COPY accept such broken ISO bytes, just
> > complaining about, but not stopping the insert of the row?
>
> No.  We don't particularly believe in the utility of invalid data.
>
> If you don't actually care about what encoding your data is in,
> you could use SQL_ASCII as the database "encoding" and thereby
> disable all UTF8-specific behavior.  Otherwise, maybe this conversion
> is a good time to clean up the mess?

Something like this approach might be useful for fixing the CSV file:

https://codereview.stackexchange.com/questions/185821/convert-a-mix-of-latin-1-and-utf-8-to-proper-utf-8

I haven't tested that program but it looks like the right sort of
approach; I remember writing similar logic to untangle the strange
mixtures of Latin 1, Windows 1252, and UTF-8  that late 90s browsers
used to send.  That sort of approach can't fix every theoretical
problem (some valid Latin1 sequences are also valid UTF-8 sequences)
but it's doable with text in European languages.

pgsql-general by date:

From: "Bellrose, Brian"
Date: 27 March 2020, 20:10:22
Subject: Promoting Hot standby after running select pg_xlog_replay_pause();

From: Andrew Gierth
Date: 27 March 2020, 21:58:32
Subject: Re: \COPY to accept non UTF-8 chars in CHAR columns

Re: \COPY to accept non UTF-8 chars in CHAR columns - Mailing list pgsql-general

Previous

Next