Thread: invalid byte sequence ?

invalid byte sequence ?

From
Andreas
Date:
Hi,

I've got pg 8.1.4 from the binary Windows installer.
Windows 2000 / German
Now I entered "\d" into psql on the text-console and got this:

db_test=# \d
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a

What's up ?
db_test was created UTF8 encoded




Re: invalid byte sequence ?

From
Bruce Momjian
Date:
Andreas wrote:
> Hi,
>
> I've got pg 8.1.4 from the binary Windows installer.
> Windows 2000 / German
> Now I entered "\d" into psql on the text-console and got this:
>
> db_test=# \d
> ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
>
> What's up ?
> db_test was created UTF8 encoded

What does your client_encoding show?  It should be UTF8 too.
--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: invalid byte sequence ?

From
Andreas
Date:

Bruce Momjian schrieb:
> Andreas wrote:
>
>> I've got pg 8.1.4 from the binary Windows installer.
>> Windows 2000 / German
>> Now I entered "\d" into psql on the text-console and got this:
>>
>> db_test=# \d
>> ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
>>
>> What's up ?
>> db_test was created UTF8 encoded
>>
>
> What does your client_encoding show?  It should be UTF8 too.
>

it is.

db_test=# \d
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
db_test=# show client_encoding;
 client_encoding
-----------------
 UTF8
(1 Zeile)

psql complains about the code page, too, now.  (850  vs.  1252)
I'm sure I checked it the other day with a cmd that used 1252 and still
got the error for the \d command.



Re: invalid byte sequence ?

From
Tom Lane
Date:
Andreas <maps.on@gmx.net> writes:
> I've got pg 8.1.4 from the binary Windows installer.
> Windows 2000 / German
> Now I entered "\d" into psql on the text-console and got this:
>
> db_test=# \d
> ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a

I can replicate this by using a UTF8 database and running the client
in a non-UTF8 locale.  For example

$ LANG=de_DE.iso88591 psql postgres
Dies ist psql 8.2devel, das interaktive PostgreSQL-Terminal.

Geben Sie ein:  \copyright f�r Urheberrechtsinformationen
                \h f�r Hilfe �ber SQL-Anweisungen
                \? f�r Hilfe �ber interne Anweisungen
                \g oder Semikolon, um eine Anfrage auszuf�hren
                \q um zu beenden

postgres=# \l
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572222c
TIP:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
postgres=# \d
ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
TIP:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
postgres=# \encoding
UTF8
postgres=#

The problem here is that psql is using gettext() to convert column
headings for its display to German, and gettext() sees its locale
as specifying ISO8859-1, so that's the encoding it produces.  When
that data is sent over to the server --- which thinks that the
client is using UTF8 encoding, because it hasn't been told any
different --- the server quite naturally barfs.

We've known about this and related issues with gettext for some time,
but a bulletproof solution isn't clear.  For the moment all you can
do is be real careful about making your locale settings match up.

            regards, tom lane

Re: invalid byte sequence ?

From
Bruce Momjian
Date:
Is this a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:
> Andreas <maps.on@gmx.net> writes:
> > I've got pg 8.1.4 from the binary Windows installer.
> > Windows 2000 / German
> > Now I entered "\d" into psql on the text-console and got this:
> >
> > db_test=# \d
> > ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
>
> I can replicate this by using a UTF8 database and running the client
> in a non-UTF8 locale.  For example
>
> $ LANG=de_DE.iso88591 psql postgres
> Dies ist psql 8.2devel, das interaktive PostgreSQL-Terminal.
>
> Geben Sie ein:  \copyright f�r Urheberrechtsinformationen
>                 \h f�r Hilfe �ber SQL-Anweisungen
>                 \? f�r Hilfe �ber interne Anweisungen
>                 \g oder Semikolon, um eine Anfrage auszuf�hren
>                 \q um zu beenden
>
> postgres=# \l
> ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572222c
> TIP:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
> postgres=# \d
> ERROR:  invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
> TIP:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
> postgres=# \encoding
> UTF8
> postgres=#
>
> The problem here is that psql is using gettext() to convert column
> headings for its display to German, and gettext() sees its locale
> as specifying ISO8859-1, so that's the encoding it produces.  When
> that data is sent over to the server --- which thinks that the
> client is using UTF8 encoding, because it hasn't been told any
> different --- the server quite naturally barfs.
>
> We've known about this and related issues with gettext for some time,
> but a bulletproof solution isn't clear.  For the moment all you can
> do is be real careful about making your locale settings match up.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: invalid byte sequence ?

From
Tom Lane
Date:
I wrote:
> We've known about this and related issues with gettext for some time,
> but a bulletproof solution isn't clear.  For the moment all you can
> do is be real careful about making your locale settings match up.

I forgot to mention that it works fine if the server is told the client
encoding actually being used:

postgres=# \encoding iso8859-1
postgres=# \l
        Liste der Datenbanken
    Name    | Eigent�mer | Kodierung
------------+------------+-----------
 postgres   | tgl        | UTF8
 regression | tgl        | SQL_ASCII
 template0  | tgl        | UTF8
 template1  | tgl        | UTF8
(4 Zeilen)

postgres=# \d
Keine Relationen gefunden
postgres=#

A possible solution therefore is to have psql or libpq drive the
client_encoding off the client's locale environment instead of letting
it default to equal the server_encoding.  But I'm not sure what
downsides that would have, and in any case it's not entirely clear that
we can always derive the correct Postgres encoding name from the
system's locale info.

            regards, tom lane

Re: invalid byte sequence ?

From
Martijn van Oosterhout
Date:
On Wed, Aug 23, 2006 at 06:52:00PM -0400, Tom Lane wrote:
> A possible solution therefore is to have psql or libpq drive the
> client_encoding off the client's locale environment instead of letting
> it default to equal the server_encoding.  But I'm not sure what
> downsides that would have, and in any case it's not entirely clear that
> we can always derive the correct Postgres encoding name from the
> system's locale info.

For glibc systems we can get 100% reliable results. Even for other
systems there's standard code out there for determining the charset.
But this has been discussed before:

http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php
http://archives.postgresql.org/pgsql-general/2004-04/msg00470.php
http://archives.postgresql.org/pgsql-hackers/2006-06/msg01027.php

It seems to me that setting the client encoding based on the
client-locale is the *only* sensible way of doing it. The locale is
going to effect the results of programs like sort and any scripts used
to process the data anyway.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: invalid byte sequence ?

From
Alvaro Herrera
Date:
Martijn van Oosterhout wrote:

> For glibc systems we can get 100% reliable results. Even for other
> systems there's standard code out there for determining the charset.
> But this has been discussed before:
>
> http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php
> http://archives.postgresql.org/pgsql-general/2004-04/msg00470.php
> http://archives.postgresql.org/pgsql-hackers/2006-06/msg01027.php
>
> It seems to me that setting the client encoding based on the
> client-locale is the *only* sensible way of doing it. The locale is
> going to effect the results of programs like sort and any scripts used
> to process the data anyway.

Yes please.  This would make the pgsql-es-ayuda list lose a small but
measurable amount of its traffic (which I won't miss).  Non-matching
\encoding settings is just too frequent.

FWIW I'm not sure if it really belongs in libpq, or it must be rather in
psql (and thus in every client).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: invalid byte sequence ?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Martijn van Oosterhout wrote:
>> It seems to me that setting the client encoding based on the
>> client-locale is the *only* sensible way of doing it.

> Yes please.

> FWIW I'm not sure if it really belongs in libpq, or it must be rather in
> psql (and thus in every client).

libpq is what implements PGCLIENTENCODING, so I'd say that's where any
change in the default has to be handled.  Presumably we'd still allow
PGCLIENTENCODING to override the locale?

            regards, tom lane

Re: invalid byte sequence ?

From
Peter Eisentraut
Date:
Tom Lane wrote:
> A possible solution therefore is to have psql or libpq drive the
> client_encoding off the client's locale environment instead of
> letting it default to equal the server_encoding.

I have been proposing that for years, but just about now the Japanese
would speak up and protest ...  I say, rush this in before anyone
notices.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: invalid byte sequence ?

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> A possible solution therefore is to have psql or libpq drive the
>> client_encoding off the client's locale environment instead of
>> letting it default to equal the server_encoding.

> I have been proposing that for years, but just about now the Japanese
> would speak up and protest ...  I say, rush this in before anyone
> notices.

I guess the key point might be "what do we do if the client locale
is C?"  Perhaps if it's C, we continue to use the server encoding
as we have in the past.  This would be a reasonable fallback in
other cases where we fail to deduce an encoding from the locale, too.

            regards, tom lane

Re: invalid byte sequence ?

From
Karsten Hilbert
Date:
On Thu, Aug 24, 2006 at 01:17:49PM -0400, Tom Lane wrote:

> I guess the key point might be "what do we do if the client locale
> is C?"  Perhaps if it's C, we continue to use the server encoding
> as we have in the past.  This would be a reasonable fallback in
> other cases where we fail to deduce an encoding from the locale, too.

In that case I would suggest to also emit a suitable warning
(with a postgresql.conf option to switch that off which
defaults to ON).

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: invalid byte sequence ?

From
Peter Eisentraut
Date:
Am Donnerstag, 24. August 2006 23:22 schrieb Karsten Hilbert:
> In that case I would suggest to also emit a suitable warning
> (with a postgresql.conf option to switch that off which
> defaults to ON).

libpq can neither read postgresql.conf nor does it have the liberty to write
messages anywhere.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: invalid byte sequence ?

From
Karsten Hilbert
Date:
On Fri, Aug 25, 2006 at 01:53:30PM +0200, Peter Eisentraut wrote:

> > In that case I would suggest to also emit a suitable warning
> > (with a postgresql.conf option to switch that off which
> > defaults to ON).
>
> libpq can neither read postgresql.conf nor does it have the liberty to write
> messages anywhere.
LOL, duh, of course. Don't know how I got that idea.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: invalid byte sequence ?

From
Bruce Momjian
Date:
Is this being done?

---------------------------------------------------------------------------

Karsten Hilbert wrote:
> On Thu, Aug 24, 2006 at 01:17:49PM -0400, Tom Lane wrote:
>
> > I guess the key point might be "what do we do if the client locale
> > is C?"  Perhaps if it's C, we continue to use the server encoding
> > as we have in the past.  This would be a reasonable fallback in
> > other cases where we fail to deduce an encoding from the locale, too.
>
> In that case I would suggest to also emit a suitable warning
> (with a postgresql.conf option to switch that off which
> defaults to ON).
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +