Re: pg_dump/restore encoding woes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_dump/restore encoding woes
Date
Msg-id 66480.1377532742@sss.pgh.pa.us
Whole thread Raw
In response to pg_dump/restore encoding woes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: pg_dump/restore encoding woes  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> When client encoding is not specified explicitly with the -E option, or 
> PGCLIENTENCODING env variable, the dump is created in the server encoding.

Yeah, that's intentional as I recall.

> However, pg_dump is special, because client encoding affects not only 
> the encoding used to speak to the server, but it also determines how the 
> resulting dump is encoded. If you have a UTF-8 server, and a LATIN1 
> console, there is no way to get a UTF-8 encoded dump of a single table 
> which has non-ASCII characters in its name. There is a good reason to 
> want to dump in the server encoding regardless of the encoding of the 
> client: that avoids the costly encoding conversion during the dump, and 
> very likely another conversion back on restore. (as a convenience, it 
> would be nice if you could specify "-E server" to mean "same as server 
> encoding")

There's a considerably more compelling reason than speed to default to
avoiding a conversion: doing a conversion carries significant risk of
outright failure, due to not being able to convert some data character
to the client character set.

> The pg_dump -E option just sets client_encoding, but I think it would be 
> better for -E to only set the encoding used in the dump, and 
> PGCLIENTENCODING env variable (if set) was used to determine the 
> encoding of the command-line arguments. Opinions?

I think this is going to be a lot easier said than done, but feel free
to see if you can make it work.  (As you point out, we don't have
any client-side encoding conversion infrastructure, but I don't see
how you're going to make this work without it.)

A second issue is whether we should divorce -E and PGCLIENTENCODING like
that, when they have always meant the same thing.  You mentioned the
alternative of looking at pg_dump's locale environment to determine the
command line encoding --- would that be better?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: pg_dump/restore encoding woes
Next
From: Pavel Stehule
Date:
Subject: Re: median and percentile function implementation