Thread: BUG #5801: characters not encoded properly for column names
The following bug has been logged online: Bug reference: 5801 Logged by: Marc Cousin Email address: cousinmarc@gmail.com PostgreSQL version: 9.0.2 Operating system: Windows XP Description: characters not encoded properly for column names Details: I get a different behaviour between a Linux and a Windows server, when a user creates an accentuated column name. All tests below were done with a linux psql client, the console being set on win1252 charset (so the input character is truly 'é' in win1252) With the Linux server : marc=# SET client_encoding TO 'win1252'; SET marc=# CREATE TABLE test (nom varchar, prénom varchar); CREATE TABLE marc=# \d test Table "public.test" Column | Type | Modifiers --------+-------------------+----------- nom | character varying | prénom | character varying | 'prénom' is also displayed correctly if client_encoding and console are UTF8, so the conversion is good. With the Windows server : test=# SET client_encoding TO 'win1252'; SET test=# CREATE TABLE test (nom varchar, prénom varchar); CREATE TABLE test=# \d test ERROR: invalid byte sequence for encoding "UTF8": 0xe3a96e test=# SELECT attname from pg_attribute where attrelid = (select oid from pg_class where relname = 'test'); ERROR: invalid byte sequence for encoding "UTF8": 0xe3a96e test=# select version(); version ------------------------------------------------------------- PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 32-bit (1 row) The main reason that this is a problem is that the table cannot be pg_dumped anymore because of this.
Hello Marc, It was entered from a windows cmd console? It doesn't use win encodings as default. For example, it must be executed with parameter /c chcp 1250 for win1250 encoding. Regards Pavel Stehule 2010/12/23 Marc Cousin <cousinmarc@gmail.com>: > > The following bug has been logged online: > > Bug reference: =C2=A0 =C2=A0 =C2=A05801 > Logged by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Marc Cousin > Email address: =C2=A0 =C2=A0 =C2=A0cousinmarc@gmail.com > PostgreSQL version: 9.0.2 > Operating system: =C2=A0 Windows XP > Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0characters not encoded properly f= or column names > Details: > > I get a different behaviour between a Linux and a Windows server, when a > user creates an accentuated column name. > > All tests below were done with a linux psql client, the console being set= on > win1252 charset (so the input character is truly '=C3=A9' in win1252) > > With the Linux server : > marc=3D# SET client_encoding TO 'win1252'; > SET > marc=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar); > CREATE TABLE > > > marc=3D# \d test > > > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Table "public.test" > > > =C2=A0Column | =C2=A0 =C2=A0 =C2=A0 Type =C2=A0 =C2=A0 =C2=A0 =C2=A0| Mod= ifiers > > > --------+-------------------+----------- > > > =C2=A0nom =C2=A0 =C2=A0| character varying | > > > =C2=A0pr=C3=A9nom | character varying | > > 'pr=C3=A9nom' is also displayed correctly if client_encoding and console = are > UTF8, so the conversion is good. > > With the Windows server : > test=3D# SET client_encoding TO 'win1252'; > SET > test=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar); > CREATE TABLE > test=3D# \d test > ERROR: =C2=A0invalid byte sequence for encoding "UTF8": 0xe3a96e > test=3D# SELECT attname from pg_attribute where attrelid =3D (select oid = from > pg_class where relname =3D 'test'); > ERROR: =C2=A0invalid byte sequence for encoding "UTF8": 0xe3a96e > test=3D# select version(); > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 version > ------------------------------------------------------------- > =C2=A0PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 32-bit > (1 row) > > > The main reason that this is a problem is that the table cannot be pg_dum= ped > anymore because of this. > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs >
No, from a Linux psql client (inside a kde konsole). You can setup any char= set=20 in it. I have exactly the same behaviour with psql under windows anyway, wi= th=20 a chcp 1252 in the cmd console. It's not a console charset problem, I've=20 tripled checked that :) And anyway, this character shouldn't get into the database as UTF8, as it i= s=20 1252 (hence the error message). The Thursday 23 December 2010 11:24:21, Pavel Stehule wrote : > Hello Marc, >=20 > It was entered from a windows cmd console? It doesn't use win > encodings as default. For example, it must be executed with parameter > /c chcp 1250 for win1250 encoding. >=20 > Regards >=20 > Pavel Stehule >=20 > 2010/12/23 Marc Cousin <cousinmarc@gmail.com>: > > The following bug has been logged online: > >=20 > > Bug reference: 5801 > > Logged by: Marc Cousin > > Email address: cousinmarc@gmail.com > > PostgreSQL version: 9.0.2 > > Operating system: Windows XP > > Description: characters not encoded properly for column names > > Details: > >=20 > > I get a different behaviour between a Linux and a Windows server, when a > > user creates an accentuated column name. > >=20 > > All tests below were done with a linux psql client, the console being s= et > > on win1252 charset (so the input character is truly '=C3=A9' in win1252) > >=20 > > With the Linux server : > > marc=3D# SET client_encoding TO 'win1252'; > > SET > > marc=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar); > > CREATE TABLE > >=20 > >=20 > > marc=3D# \d test > >=20 > >=20 > > Table "public.test" > >=20 > >=20 > > Column | Type | Modifiers > >=20 > >=20 > > --------+-------------------+----------- > >=20 > >=20 > > nom | character varying | > >=20 > >=20 > > pr=C3=A9nom | character varying | > >=20 > > 'pr=C3=A9nom' is also displayed correctly if client_encoding and consol= e are > > UTF8, so the conversion is good. > >=20 > > With the Windows server : > > test=3D# SET client_encoding TO 'win1252'; > > SET > > test=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar); > > CREATE TABLE > > test=3D# \d test > > ERROR: invalid byte sequence for encoding "UTF8": 0xe3a96e > > test=3D# SELECT attname from pg_attribute where attrelid =3D (select oi= d from > > pg_class where relname =3D 'test'); > > ERROR: invalid byte sequence for encoding "UTF8": 0xe3a96e > > test=3D# select version(); > > version > > ------------------------------------------------------------- > > PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 32-bit > > (1 row) > >=20 > >=20 > > The main reason that this is a problem is that the table cannot be > > pg_dumped anymore because of this. > >=20 > > -- > > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-bugs
On 12/23/10 2:34 AM, Marc Cousin wrote: > No, from a Linux psql client (inside a kde konsole). You can setup any charset > in it. I have exactly the same behaviour with psql under windows anyway, with > a chcp 1252 in the cmd console. It's not a console charset problem, I've > tripled checked that :) > > And anyway, this character shouldn't get into the database as UTF8, as it is > 1252 (hence the error message). does client_encoding affect names ? (I'm asking because I have no idea). what encodings are the database clusters on the two platforms?
Le jeudi 23 d=C3=A9cembre 2010 18:21:55, John R Pierce a =C3=A9crit : > On 12/23/10 2:34 AM, Marc Cousin wrote: > > No, from a Linux psql client (inside a kde konsole). You can setup any > > charset in it. I have exactly the same behaviour with psql under windows > > anyway, with a chcp 1252 in the cmd console. It's not a console charset > > problem, I've tripled checked that :) > >=20 > > And anyway, this character shouldn't get into the database as UTF8, as = it > > is 1252 (hence the error message). >=20 > does client_encoding affect names ? (I'm asking because I have no idea). Yes (for the Linux server, for Windows it fails). I have exactly the same problem if I test with LATIN9 (except that the utf8= =20 error message has a different value for the bad character). >=20 > what encodings are the database clusters on the two platforms? Oh. Both are UTF-8.
On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com> wrote: > With the Windows server : > test=# SET client_encoding TO 'win1252'; > SET I have a vague recollection that the argument to SET client_encoding isn't validated on Windows, and if you enter a value that it doesn't like it simply silently doesn't work. Am I wrong? What happens if you do: SET client_encoding TO 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding'; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2010/12/27 Robert Haas <robertmhaas@gmail.com>: > On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com> wrote: >> With the Windows server : >> test=3D# SET client_encoding TO 'win1252'; >> SET > > I have a vague recollection that the argument to SET client_encoding > isn't validated on Windows, and if you enter a value that it doesn't > like it simply silently doesn't work. =A0Am I wrong? =A0What happens if > you do: > > SET client_encoding TO > 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding'; Here it is=85 postgres=3D# SET client_encoding TO 'foo'; ERROR: invalid value for parameter "client_encoding": "foo" (It does the same with your really long string by the way :) ) Seems validated to me ?
On Tue, Dec 28, 2010 at 4:01 AM, Marc Cousin <cousinmarc@gmail.com> wrote: > 2010/12/27 Robert Haas <robertmhaas@gmail.com>: >> On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com> wrot= e: >>> With the Windows server : >>> test=3D# SET client_encoding TO 'win1252'; >>> SET >> >> I have a vague recollection that the argument to SET client_encoding >> isn't validated on Windows, and if you enter a value that it doesn't >> like it simply silently doesn't work. =A0Am I wrong? =A0What happens if >> you do: >> >> SET client_encoding TO >> 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding'; > > Here it is=85 > > postgres=3D# SET client_encoding TO 'foo'; > ERROR: =A0invalid value for parameter "client_encoding": "foo" > > (It does the same with your really long string by the way :) ) > > Seems validated to me ? Hrm, OK. Well, you just used up my one guess. :-( --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
The Tuesday 28 December 2010 12:49:20, Robert Haas wrote : > On Tue, Dec 28, 2010 at 4:01 AM, Marc Cousin <cousinmarc@gmail.com> wrote: > > 2010/12/27 Robert Haas <robertmhaas@gmail.com>: > >> On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com>=20 wrote: > >>> With the Windows server : > >>> test=3D# SET client_encoding TO 'win1252'; > >>> SET > >>=20 > >> I have a vague recollection that the argument to SET client_encoding > >> isn't validated on Windows, and if you enter a value that it doesn't > >> like it simply silently doesn't work. Am I wrong? What happens if > >> you do: > >>=20 > >> SET client_encoding TO > >> 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding= '; > >=20 > > Here it is=85 > >=20 > > postgres=3D# SET client_encoding TO 'foo'; > > ERROR: invalid value for parameter "client_encoding": "foo" > >=20 > > (It does the same with your really long string by the way :) ) > >=20 > > Seems validated to me ? >=20 > Hrm, OK. Well, you just used up my one guess. :-( Sorry about that. Anyone else wanting to take a guess ? :)