encoding confusion with \copy command - Mailing list pgsql-general

From Martin Waite
Subject encoding confusion with \copy command
Date
Msg-id CAOWKicvhP+qw41OWTk=eZXEssOsMrsaRQRcXbCOPmEsbbCwZFw@mail.gmail.com
Whole thread Raw
Responses Re: encoding confusion with \copy command  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
Hi,

I have a postgresql 7.4 server and client on Centos 6.4.  The database server is using UTF-8 encoding.

I have been exploring the use of the \copy command for importing CSV data generated by SQL Server 2008.  SQL Server 2008 export tool does not escape quotes that are in the content of fields, and so it is useful to be able to specify obscure characters in the quote option in the \copy command to work around this issue.

When I run the following commands in psql, I am surprised that QUOTE is limited to characters in the range 0x01 - 0x7f, and that UTF8 is mentioned in the error message if characters outside the range are chosen:

\encoding WIN1252
\copy yuml from '/tmp/yuml.csv'  WITH CSV HEADER ENCODING 'WIN1252' QUOTE as E'\xff';
ERROR:  invalid byte sequence for encoding "UTF8": 0xff


I thought that if the client (psql) is WIN1252, and the CSV file is specified as WIN1252, then I could specify any valid WIN1252 character as the quote character.   Instead, I am limited to the range of characters that can be encoded as a single byte in UTF-8. Actually, 0x00 is not accepted either, so the range is 0x01 - 0x7F.

Is this a bug or expected behaviour ?

Is it the case that the server does the actual CSV parsing, and that given that my server is in UTF8, I am therefore limited to single-byte UTF8 characters ?

regards,
Martin

pgsql-general by date:

Previous
From: Dev Kumkar
Date:
Subject: Re: Regarding timezone
Next
From: Dev Kumkar
Date:
Subject: pg_multixact issues