Thread: BUG #6308: Problem w. encoding in client

BUG #6308: Problem w. encoding in client

From
"Thomas Goerner"
Date:
The following bug has been logged online:

Bug reference:      6308
Logged by:          Thomas Goerner
Email address:      tg@clickware.de
PostgreSQL version: 9.1.1
Operating system:   Windows 7 64-bit
Description:        Problem w. encoding in client
Details:

Hi, we have a problem regarding encoding with postgres  9.1.1 and Win7
64-bit

Database encoding: UTF-8
active codepage in Windows console: 1252
PGCLIENTENCODING: Win1252
Console font: Lucida console

In the above configuration, the following problems occur:

1)
Text output from the client applications, e.g. the welcome-prompt of psql or
the help page from pg_dump --help is not displayed correctly (especially
german Umlauts and characters like "«" ).

2)
When we restore a dump in custom format and then try to re-dump the
database, we get error messages like Zeichen 0xe28093 in Kodierung »UTF8«
hat keine Entsprechung in »Win1252« (character 0xe28093 in UTF-8 cannot be
translated to Win1252)

The above configuration is our standard configuration and works just fine in
Windows XP and even in Windows 7 32-bit.

Is there any solution to this problem?

Thanks in advance
Thomas

Re: BUG #6308: Problem w. encoding in client

From
Craig Ringer
Date:
On 11/25/2011 08:21 PM, Thomas Goerner wrote:
>
> The following bug has been logged online:
>
> Bug reference:      6308
> Logged by:          Thomas Goerner
> Email address:      tg@clickware.de
> PostgreSQL version: 9.1.1
> Operating system:   Windows 7 64-bit
> Description:        Problem w. encoding in client
> Details:
>
> Hi, we have a problem regarding encoding with postgres  9.1.1 and Win7
> 64-bit
>
> Database encoding: UTF-8
> active codepage in Windows console: 1252
> PGCLIENTENCODING: Win1252
> Console font: Lucida console
>
> In the above configuration, the following problems occur:
>
> 1)
> Text output from the client applications, e.g. the welcome-prompt of psql or
> the help page from pg_dump --help is not displayed correctly (especially
> german Umlauts and characters like "«" ).

That shouldn't be happening. As a workaround, try using a unicode
console (see the "chcp" command) and a unicode client encoding.

The issue with mismatched chars sounds like a real bug that wants
looking into.

> When we restore a dump in custom format and then try to re-dump the
> database, we get error messages like Zeichen 0xe28093 in Kodierung »UTF8«
> hat keine Entsprechung in »Win1252« (character 0xe28093 in UTF-8 cannot be
> translated to Win1252)

Restore using PgAdmin III or using a unicode console. This is a
limitation of using a Win1252 client encoding when restoring data that
isn't restricted to Win1252 and cannot be fixed directly.

If you don't mind possibly corrupted error and NOTICE messages you can
just set a unicode client_encoding for your restore.

--
Craig Ringer

Re: BUG #6308: Problem w. encoding in client

From
"Thomas Goerner"
Date:
Hello Craig,

=20

thanks for your answer.

=20

=20

> Restore using PgAdmin III or using a unicode console.=20

> This is a limitation of using a Win1252 client encoding when restoring=20

> data that isn't restricted to Win1252 and cannot be fixed directly.

=20

That's new to me. AFAIK pg_restore looks into the dump file and sets the
client encoding accordingly (In fact the dump contains the statement SET
client_encoding =3D 'UTF8';). Is this overridden by PGCLIENTENCODING? And if
so, should it be?

=20

And as we only encounter both problems in Windows7-64, it seems to me they
are closely related.

=20

Regards

Thomas

=20

=20

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

click:ware Informationstechnik GmbH

Thomas Goerner

Gesch=E4ftsf=FChrer

fon: 0221 - 13 99 88-0

fax: 0221 - 13 99 88-79

Kamekestra=DFe 19

50672 K=F6ln

tg@clickware.de

www.clickware.de

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Kennen Sie schon unser GasDataWarehouse - Die kosteng=FCnstige L=F6sung f=
=FCr den
Austausch von Gasmessdaten?

www.gasdatawarehouse.de

=20

-----Urspr=FCngliche Nachricht-----

Von: Craig Ringer [mailto:ringerc@ringerc.id.au]

Gesendet: Sonntag, 27. November 2011 10:00

An: Thomas Goerner

Cc: pgsql-bugs@postgresql.org

Betreff: Re: [BUGS] BUG #6308: Problem w. encoding in client

=20

On 11/25/2011 08:21 PM, Thomas Goerner wrote:

>=20

> The following bug has been logged online:

>=20

> Bug reference:      6308

> Logged by:          Thomas Goerner

> Email address:      tg@clickware.de

> PostgreSQL version: 9.1.1

> Operating system:   Windows 7 64-bit

> Description:        Problem w. encoding in client

> Details:

>=20

> Hi, we have a problem regarding encoding with postgres  9.1.1 and Win7=20

> 64-bit

>=20

> Database encoding: UTF-8

> active codepage in Windows console: 1252

> PGCLIENTENCODING: Win1252

> Console font: Lucida console

>=20

> In the above configuration, the following problems occur:

>=20

> 1)

> Text output from the client applications, e.g. the welcome-prompt of=20

> psql or the help page from pg_dump --help is not displayed correctly=20

> (especially german Umlauts and characters like "=AB" ).

=20

That shouldn't be happening. As a workaround, try using a unicode console
(see the "chcp" command) and a unicode client encoding.

=20

The issue with mismatched chars sounds like a real bug that wants looking
into.

=20

> When we restore a dump in custom format and then try to re-dump the=20

> database, we get error messages like Zeichen 0xe28093 in Kodierung=20

> =BBUTF8=AB hat keine Entsprechung in =BBWin1252=AB (character 0xe28093 in=
=20

> UTF-8 cannot be translated to Win1252)

=20

Restore using PgAdmin III or using a unicode console. This is a limitation
of using a Win1252 client encoding when restoring data that isn't restricted
to Win1252 and cannot be fixed directly.

=20

If you don't mind possibly corrupted error and NOTICE messages you can just
set a unicode client_encoding for your restore.

=20

--

Craig Ringer

=20

=20

Re: BUG #6308: Problem w. encoding in client

From
Craig Ringer
Date:
On 11/28/2011 08:26 PM, Thomas Goerner wrote:
>
> Hello Craig,
>
> thanks for your answer.
>
> > Restore using PgAdmin III or using a unicode console.
>
> > This is a limitation of using a Win1252 client encoding when restoring
>
> > data that isn't restricted to Win1252 and cannot be fixed directly.
>
> That's new to me. AFAIK pg_restore looks into the dump file and sets
> the client encoding accordingly (In fact the dump contains the
> statement SET client_encoding = 'UTF8';). Is this overridden by
> PGCLIENTENCODING? And if so, should it be?
>

Nope, pg_restore should be using UTF8 as the client encoding in that
case. If there are any errors or notices it won't be able to emit them
correctly on the terminal though, as win1252 can't represent everything
in UTF8 (and IIRC pg_restore doesn't recode from client_encoding to
terminal encoding anyway).

If the restore its self is failing then I agree that something's not
working properly, because you should be able to use a client_encoding
different to your terminal encoding. I wonder if recent changes intended
to get psql to pick up the terminal encoding automatically have had the
unintended side-effect of overriding pg_restore's attempt to set the
client_encoding?

I'm rather surprised you only see this on x64. You're using the same
Windows and Pg version for both x64 and x64 but only the x64 test fails?

--
Craig Ringer

Re: BUG #6308: Problem w. encoding in client

From
"Thomas Goerner"
Date:
Hello Craig,

=20

it seems as if there were illegal chars in the originally dumped database,
so the dump/restore problem might be due to this. At the moment we are doing
further investigation on this issue.=20

=20

But the problem regarding message output from the client applications still
persists. We are now setting up another set of 32/64-bit (virtual) Windows 7
machines to verify that the problem occurs only on 64 bit windows.

=20

I will keep you informed.

=20

=20

Regards

Thomas

=20

=20

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
click:ware Informationstechnik GmbH
Thomas Goerner
Gesch=E4ftsf=FChrer
fon: 0221 - 13 99 88-0
fax: 0221 - 13 99 88-79
Kamekestra=DFe 19
50672 K=F6ln
 <mailto:tg@clickware.de> tg@clickware.de
 <http://www.clickware.de/> www.clickware.de
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Kennen Sie schon unser GasDataWarehouse -=20
Die kosteng=FCnstige L=F6sung f=FCr den Austausch=20
von Gasmessdaten?
 <http://www.gasdatawarehouse.de/> www.gasdatawarehouse.de

  _____=20=20

Von: Craig Ringer [mailto:ringerc@ringerc.id.au]=20
Gesendet: Dienstag, 29. November 2011 03:33
An: Thomas Goerner
Cc: pgsql-bugs@postgresql.org
Betreff: Re: [BUGS] BUG #6308: Problem w. encoding in client

=20

On 11/28/2011 08:26 PM, Thomas Goerner wrote:=20

Hello Craig,

=20

thanks for your answer.

=20

=20

> Restore using PgAdmin III or using a unicode console.=20

> This is a limitation of using a Win1252 client encoding when restoring=20

> data that isn't restricted to Win1252 and cannot be fixed directly.

=20

That's new to me. AFAIK pg_restore looks into the dump file and sets the
client encoding accordingly (In fact the dump contains the statement SET
client_encoding =3D 'UTF8';). Is this overridden by PGCLIENTENCODING? And if
so, should it be?


Nope, pg_restore should be using UTF8 as the client encoding in that case.
If there are any errors or notices it won't be able to emit them correctly
on the terminal though, as win1252 can't represent everything in UTF8 (and
IIRC pg_restore doesn't recode from client_encoding to terminal encoding
anyway).

If the restore its self is failing then I agree that something's not working
properly, because you should be able to use a client_encoding different to
your terminal encoding. I wonder if recent changes intended to get psql to
pick up the terminal encoding automatically have had the unintended
side-effect of overriding pg_restore's attempt to set the client_encoding?

I'm rather surprised you only see this on x64. You're using the same Windows
and Pg version for both x64 and x64 but only the x64 test fails?

--
Craig Ringer