Thread: invalid byte sequence ?
Hi, I've got pg 8.1.4 from the binary Windows installer. Windows 2000 / German Now I entered "\d" into psql on the text-console and got this: db_test=# \d ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a What's up ? db_test was created UTF8 encoded
Andreas wrote: > Hi, > > I've got pg 8.1.4 from the binary Windows installer. > Windows 2000 / German > Now I entered "\d" into psql on the text-console and got this: > > db_test=# \d > ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a > > What's up ? > db_test was created UTF8 encoded What does your client_encoding show? It should be UTF8 too. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian schrieb: > Andreas wrote: > >> I've got pg 8.1.4 from the binary Windows installer. >> Windows 2000 / German >> Now I entered "\d" into psql on the text-console and got this: >> >> db_test=# \d >> ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a >> >> What's up ? >> db_test was created UTF8 encoded >> > > What does your client_encoding show? It should be UTF8 too. > it is. db_test=# \d ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a db_test=# show client_encoding; client_encoding ----------------- UTF8 (1 Zeile) psql complains about the code page, too, now. (850 vs. 1252) I'm sure I checked it the other day with a cmd that used 1252 and still got the error for the \d command.
Andreas <maps.on@gmx.net> writes: > I've got pg 8.1.4 from the binary Windows installer. > Windows 2000 / German > Now I entered "\d" into psql on the text-console and got this: > > db_test=# \d > ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a I can replicate this by using a UTF8 database and running the client in a non-UTF8 locale. For example $ LANG=de_DE.iso88591 psql postgres Dies ist psql 8.2devel, das interaktive PostgreSQL-Terminal. Geben Sie ein: \copyright f�r Urheberrechtsinformationen \h f�r Hilfe �ber SQL-Anweisungen \? f�r Hilfe �ber interne Anweisungen \g oder Semikolon, um eine Anfrage auszuf�hren \q um zu beenden postgres=# \l ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572222c TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlledby "client_encoding". postgres=# \d ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlledby "client_encoding". postgres=# \encoding UTF8 postgres=# The problem here is that psql is using gettext() to convert column headings for its display to German, and gettext() sees its locale as specifying ISO8859-1, so that's the encoding it produces. When that data is sent over to the server --- which thinks that the client is using UTF8 encoding, because it hasn't been told any different --- the server quite naturally barfs. We've known about this and related issues with gettext for some time, but a bulletproof solution isn't clear. For the moment all you can do is be real careful about making your locale settings match up. regards, tom lane
Is this a TODO? --------------------------------------------------------------------------- Tom Lane wrote: > Andreas <maps.on@gmx.net> writes: > > I've got pg 8.1.4 from the binary Windows installer. > > Windows 2000 / German > > Now I entered "\d" into psql on the text-console and got this: > > > > db_test=# \d > > ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a > > I can replicate this by using a UTF8 database and running the client > in a non-UTF8 locale. For example > > $ LANG=de_DE.iso88591 psql postgres > Dies ist psql 8.2devel, das interaktive PostgreSQL-Terminal. > > Geben Sie ein: \copyright f�r Urheberrechtsinformationen > \h f�r Hilfe �ber SQL-Anweisungen > \? f�r Hilfe �ber interne Anweisungen > \g oder Semikolon, um eine Anfrage auszuf�hren > \q um zu beenden > > postgres=# \l > ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572222c > TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlledby "client_encoding". > postgres=# \d > ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a > TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlledby "client_encoding". > postgres=# \encoding > UTF8 > postgres=# > > The problem here is that psql is using gettext() to convert column > headings for its display to German, and gettext() sees its locale > as specifying ISO8859-1, so that's the encoding it produces. When > that data is sent over to the server --- which thinks that the > client is using UTF8 encoding, because it hasn't been told any > different --- the server quite naturally barfs. > > We've known about this and related issues with gettext for some time, > but a bulletproof solution isn't clear. For the moment all you can > do is be real careful about making your locale settings match up. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
I wrote: > We've known about this and related issues with gettext for some time, > but a bulletproof solution isn't clear. For the moment all you can > do is be real careful about making your locale settings match up. I forgot to mention that it works fine if the server is told the client encoding actually being used: postgres=# \encoding iso8859-1 postgres=# \l Liste der Datenbanken Name | Eigent�mer | Kodierung ------------+------------+----------- postgres | tgl | UTF8 regression | tgl | SQL_ASCII template0 | tgl | UTF8 template1 | tgl | UTF8 (4 Zeilen) postgres=# \d Keine Relationen gefunden postgres=# A possible solution therefore is to have psql or libpq drive the client_encoding off the client's locale environment instead of letting it default to equal the server_encoding. But I'm not sure what downsides that would have, and in any case it's not entirely clear that we can always derive the correct Postgres encoding name from the system's locale info. regards, tom lane
On Wed, Aug 23, 2006 at 06:52:00PM -0400, Tom Lane wrote: > A possible solution therefore is to have psql or libpq drive the > client_encoding off the client's locale environment instead of letting > it default to equal the server_encoding. But I'm not sure what > downsides that would have, and in any case it's not entirely clear that > we can always derive the correct Postgres encoding name from the > system's locale info. For glibc systems we can get 100% reliable results. Even for other systems there's standard code out there for determining the charset. But this has been discussed before: http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php http://archives.postgresql.org/pgsql-general/2004-04/msg00470.php http://archives.postgresql.org/pgsql-hackers/2006-06/msg01027.php It seems to me that setting the client encoding based on the client-locale is the *only* sensible way of doing it. The locale is going to effect the results of programs like sort and any scripts used to process the data anyway. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout wrote: > For glibc systems we can get 100% reliable results. Even for other > systems there's standard code out there for determining the charset. > But this has been discussed before: > > http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php > http://archives.postgresql.org/pgsql-general/2004-04/msg00470.php > http://archives.postgresql.org/pgsql-hackers/2006-06/msg01027.php > > It seems to me that setting the client encoding based on the > client-locale is the *only* sensible way of doing it. The locale is > going to effect the results of programs like sort and any scripts used > to process the data anyway. Yes please. This would make the pgsql-es-ayuda list lose a small but measurable amount of its traffic (which I won't miss). Non-matching \encoding settings is just too frequent. FWIW I'm not sure if it really belongs in libpq, or it must be rather in psql (and thus in every client). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Martijn van Oosterhout wrote: >> It seems to me that setting the client encoding based on the >> client-locale is the *only* sensible way of doing it. > Yes please. > FWIW I'm not sure if it really belongs in libpq, or it must be rather in > psql (and thus in every client). libpq is what implements PGCLIENTENCODING, so I'd say that's where any change in the default has to be handled. Presumably we'd still allow PGCLIENTENCODING to override the locale? regards, tom lane
Tom Lane wrote: > A possible solution therefore is to have psql or libpq drive the > client_encoding off the client's locale environment instead of > letting it default to equal the server_encoding. I have been proposing that for years, but just about now the Japanese would speak up and protest ... I say, rush this in before anyone notices. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> A possible solution therefore is to have psql or libpq drive the >> client_encoding off the client's locale environment instead of >> letting it default to equal the server_encoding. > I have been proposing that for years, but just about now the Japanese > would speak up and protest ... I say, rush this in before anyone > notices. I guess the key point might be "what do we do if the client locale is C?" Perhaps if it's C, we continue to use the server encoding as we have in the past. This would be a reasonable fallback in other cases where we fail to deduce an encoding from the locale, too. regards, tom lane
On Thu, Aug 24, 2006 at 01:17:49PM -0400, Tom Lane wrote: > I guess the key point might be "what do we do if the client locale > is C?" Perhaps if it's C, we continue to use the server encoding > as we have in the past. This would be a reasonable fallback in > other cases where we fail to deduce an encoding from the locale, too. In that case I would suggest to also emit a suitable warning (with a postgresql.conf option to switch that off which defaults to ON). Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Am Donnerstag, 24. August 2006 23:22 schrieb Karsten Hilbert: > In that case I would suggest to also emit a suitable warning > (with a postgresql.conf option to switch that off which > defaults to ON). libpq can neither read postgresql.conf nor does it have the liberty to write messages anywhere. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Fri, Aug 25, 2006 at 01:53:30PM +0200, Peter Eisentraut wrote: > > In that case I would suggest to also emit a suitable warning > > (with a postgresql.conf option to switch that off which > > defaults to ON). > > libpq can neither read postgresql.conf nor does it have the liberty to write > messages anywhere. LOL, duh, of course. Don't know how I got that idea. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Is this being done? --------------------------------------------------------------------------- Karsten Hilbert wrote: > On Thu, Aug 24, 2006 at 01:17:49PM -0400, Tom Lane wrote: > > > I guess the key point might be "what do we do if the client locale > > is C?" Perhaps if it's C, we continue to use the server encoding > > as we have in the past. This would be a reasonable fallback in > > other cases where we fail to deduce an encoding from the locale, too. > > In that case I would suggest to also emit a suitable warning > (with a postgresql.conf option to switch that off which > defaults to ON). > > Karsten > -- > GPG key ID E4071346 @ wwwkeys.pgp.net > E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +