Re: COPY command character set - Mailing list pgsql-general

From Peter Headland
Subject Re: COPY command character set
Date
Msg-id 71F491F5DA99604A80DE49424BF3D02B0CD9A21C@exchange8.actuate.com
Whole thread Raw
In response to Re: COPY command character set  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: COPY command character set
Re: COPY command character set
Re: COPY command character set
List pgsql-general
> The COPY command reference page saith
>
>    Input data is interpreted according to the current client encoding,
>    and output data is encoded in the the current client encoding, even
>    if the data does not pass through the client but is read from or
>    written to a file.

Rats - I read the manual page twice and that didn't register on my
feeble consciousness. I suspect that I didn't look beyond the word
"client", since I knew I wasn't interested in client behavior and I was
speed-reading. On the assumption that I am not uniquely stupid, maybe we
could re-phrase this slightly, with a "for example", and add a heading
"Localization"?

As a general comment, I18N/L10N is a hairy enough topic that it merits
its own heading in any commands where it is an issue.

How about my suggestion to add a means (extend COPY syntax) to specify
encoding explicitly and handle UTF lead bytes - would that be of
interest?

--
Peter Headland
Architect
Actuate Corporation


-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, September 10, 2009 10:38
To: Peter Headland
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] COPY command character set

"Peter Headland" <pheadland@actuate.com> writes:
>> set client_encoding = 'utf8';
>> copy from stdin/to stdout;

> What if I want to do this on the server side (because it's much, much
> faster)? Does COPY use the default encoding of the database? If not,
> what?

> If this is a restrictive as it appears, and there are no outstanding
> enhancements planned in this area, I might be interested in improving
> this command to allow specifying the encoding and to have it do
obvious
> stuff like recognize UTF lead bytes automatically. At the very least,
> the documentation needs some work to explain these subtleties.

The COPY command reference page saith

    Input data is interpreted according to the current client encoding,
    and output data is encoded in the the current client encoding, even
    if the data does not pass through the client but is read from or
    written to a file.

Seems clear enough to me.

            regards, tom lane

pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: query speed question
Next
From: Adrian Klaver
Date:
Subject: Re: COPY command character set