Re: COPY ENCODING revisited - Mailing list pgsql-hackers

From Robert Haas
Subject Re: COPY ENCODING revisited
Date
Msg-id AANLkTik7AYu7Zz8yQ4vk5LArqdi1gR4rd=QLOU3Tt5q0@mail.gmail.com
Whole thread Raw
In response to COPY ENCODING revisited  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
Responses Re: COPY ENCODING revisited  (Itagaki Takahiro <itagaki.takahiro@gmail.com>)
List pgsql-hackers
On Wed, Feb 16, 2011 at 10:45 PM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> COPY ENCODING patch was returned with feedback,
>  https://commitfest.postgresql.org/action/patch_view?id=501
> but we still need it for file_fdw.  Using client_encoding at runtime
> is reasonable for one-time COPY command, but logically nonsense for
> persistent file_fdw tables.
>
> Base on the latest patch,
>  http://archives.postgresql.org/pgsql-hackers/2011-01/msg02903.php
> I added pg_any_to_server() and pg_server_to_any() functions instead of
> exposing FmgrInfo in pg_wchar.h.  They are same as pg_client_to_server()
> and pg_server_to_client(), but accept any encoding. They use cached
> conversion procs only if the specified encoding matches the client encoding.
>
> According to Harada's research,
>  http://archives.postgresql.org/pgsql-hackers/2011-01/msg02397.php
> non-cached conversions are slower than cached ones. This version provides
> the same performance before when file and client encoding are same,
> but would be a bit slower on other cases. We could improve the performance
> in future versions, for example, caching each used conversion proc in
> pg_do_pg_do_encoding_conversion().
>
> file_fdw will support ENCODING option. Also, if not specified it might
> have to store the client_encoding at CREATE FOREIGN TABLE. Even if we use
> a different client_encoding at SELECT, the encoding at definition is used.
>
> ENCODING 'quoted name' issue is also fixed; it always requires quoted names.
> I think we only accept non-quoted text as identifier names. Unquoted text
> should be treated as "double quoted", but encoding names are not identifiers.

I am not qualified to fully review this patch because I'm not all that
familiar with the encoding stuff, but it looks reasonably sensible on
a quick read-through.  I am supportive of making a change in this area
even at this late date, because it seems to me that if we're not going
to change this then we're pretty much giving up on having a usable
file_fdw in 9.1.  And since postgresql_fdw isn't in very good shape
either, that would mean we may as well give up on SQL/MED.  We might
have to do that anyway, but I don't think we should do it just because
of this issue, if there's a reasonable fix.

I don't think the fact that the performance bites is a reason not to
do this.  As you say, that can always be improved in the future.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: contrib loose ends: 9.0 to 9.1 incompatibilities
Next
From: Robert Haas
Date:
Subject: Re: contrib loose ends: 9.0 to 9.1 incompatibilities