Re: Add ENCODING option to COPY - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Add ENCODING option to COPY
Date
Msg-id AANLkTi=eAtrf06WLCRTyM=KZsL41R=UoVT4QDECc7G+V@mail.gmail.com
Whole thread Raw
In response to Re: Add ENCODING option to COPY  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Responses Re: Add ENCODING option to COPY  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2011/1/25 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Sat, Jan 15, 2011 at 02:25, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
>> The patch overrides client_encoding by the added ENCODING option, and
>> restores it as soon as copy is done.
>
> We cannot do that because error messages should be encoded in the original
> encoding even during COPY commands with encoding option. Error messages
> could contain non-ASCII characters if lc_messages is set.

Agreed.

>> I see some complaints ask to use
>> pg_do_encoding_conversion() instead of
>> pg_client_to_server/server_to_client(), but the former will surely add
>> slight overhead per reading line
>
> If we want to reduce the overhead, we should cache the conversion procedure
> in CopyState. How about adding something like "FmgrInfo file_to_server_covv"
> into it?

I looked down to the code and found that we cannot pass FmgrInfo * to
any functions defined in pg_wchar.h, since the header file is shared
in libpq, too.

For the record, I also tried pg_do_encoding_conversion() instead of
pg_client_to_server/server_to_client(), and the simple benchmark shows
it is too slow.

with 3000000 lines with 3 columns (~22MB tsv) COPY FROM

*utf8 -> utf8 (no conversion)
13428.233ms
13322.832ms
15661.093ms

*euc_jp -> utf8 (client_encoding)
17527.470ms
16457.452ms
16522.337ms

*euc_jp -> utf8 (pg_do_encoding_conversion)
20550.983ms
21425.313ms
20774.323ms

I'll check the code more if we have better alternatives.

Regards,


-- 
Hitoshi Harada


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Extensions support for pg_dump, patch v27
Next
From: David Fetter
Date:
Subject: Re: Extensions support for pg_dump, patch v27