Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> When client encoding is not specified explicitly with the -E option, or
> PGCLIENTENCODING env variable, the dump is created in the server encoding.
Yeah, that's intentional as I recall.
> However, pg_dump is special, because client encoding affects not only
> the encoding used to speak to the server, but it also determines how the
> resulting dump is encoded. If you have a UTF-8 server, and a LATIN1
> console, there is no way to get a UTF-8 encoded dump of a single table
> which has non-ASCII characters in its name. There is a good reason to
> want to dump in the server encoding regardless of the encoding of the
> client: that avoids the costly encoding conversion during the dump, and
> very likely another conversion back on restore. (as a convenience, it
> would be nice if you could specify "-E server" to mean "same as server
> encoding")
There's a considerably more compelling reason than speed to default to
avoiding a conversion: doing a conversion carries significant risk of
outright failure, due to not being able to convert some data character
to the client character set.
> The pg_dump -E option just sets client_encoding, but I think it would be
> better for -E to only set the encoding used in the dump, and
> PGCLIENTENCODING env variable (if set) was used to determine the
> encoding of the command-line arguments. Opinions?
I think this is going to be a lot easier said than done, but feel free
to see if you can make it work. (As you point out, we don't have
any client-side encoding conversion infrastructure, but I don't see
how you're going to make this work without it.)
A second issue is whether we should divorce -E and PGCLIENTENCODING like
that, when they have always meant the same thing. You mentioned the
alternative of looking at pg_dump's locale environment to determine the
command line encoding --- would that be better?
regards, tom lane