Thread: BUG #5801: characters not encoded properly for column names

BUG #5801: characters not encoded properly for column names

From
"Marc Cousin"
Date:
The following bug has been logged online:

Bug reference:      5801
Logged by:          Marc Cousin
Email address:      cousinmarc@gmail.com
PostgreSQL version: 9.0.2
Operating system:   Windows XP
Description:        characters not encoded properly for column names
Details:

I get a different behaviour between a Linux and a Windows server, when a
user creates an accentuated column name.

All tests below were done with a linux psql client, the console being set on
win1252 charset (so the input character is truly 'é' in win1252)

With the Linux server :
marc=# SET client_encoding TO 'win1252';
SET
marc=# CREATE TABLE test (nom varchar, prénom varchar);
CREATE TABLE


marc=# \d test


          Table "public.test"


 Column |       Type        | Modifiers


--------+-------------------+-----------


 nom    | character varying |


 prénom | character varying |

'prénom' is also displayed correctly if client_encoding and console are
UTF8, so the conversion is good.

With the Windows server :
test=# SET client_encoding TO 'win1252';
SET
test=# CREATE TABLE test (nom varchar, prénom varchar);
CREATE TABLE
test=# \d test
ERROR:  invalid byte sequence for encoding "UTF8": 0xe3a96e
test=# SELECT attname from pg_attribute where attrelid = (select oid from
pg_class where relname = 'test');
ERROR:  invalid byte sequence for encoding "UTF8": 0xe3a96e
test=# select version();
                           version
-------------------------------------------------------------
 PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 32-bit
(1 row)


The main reason that this is a problem is that the table cannot be pg_dumped
anymore because of this.

Re: BUG #5801: characters not encoded properly for column names

From
Pavel Stehule
Date:
Hello Marc,

It was entered from a windows cmd console? It doesn't use win
encodings as default. For example, it must be executed with parameter
/c chcp 1250 for win1250 encoding.

Regards

Pavel Stehule

2010/12/23 Marc Cousin <cousinmarc@gmail.com>:
>
> The following bug has been logged online:
>
> Bug reference: =C2=A0 =C2=A0 =C2=A05801
> Logged by: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Marc Cousin
> Email address: =C2=A0 =C2=A0 =C2=A0cousinmarc@gmail.com
> PostgreSQL version: 9.0.2
> Operating system: =C2=A0 Windows XP
> Description: =C2=A0 =C2=A0 =C2=A0 =C2=A0characters not encoded properly f=
or column names
> Details:
>
> I get a different behaviour between a Linux and a Windows server, when a
> user creates an accentuated column name.
>
> All tests below were done with a linux psql client, the console being set=
 on
> win1252 charset (so the input character is truly '=C3=A9' in win1252)
>
> With the Linux server :
> marc=3D# SET client_encoding TO 'win1252';
> SET
> marc=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar);
> CREATE TABLE
>
>
> marc=3D# \d test
>
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Table "public.test"
>
>
> =C2=A0Column | =C2=A0 =C2=A0 =C2=A0 Type =C2=A0 =C2=A0 =C2=A0 =C2=A0| Mod=
ifiers
>
>
> --------+-------------------+-----------
>
>
> =C2=A0nom =C2=A0 =C2=A0| character varying |
>
>
> =C2=A0pr=C3=A9nom | character varying |
>
> 'pr=C3=A9nom' is also displayed correctly if client_encoding and console =
are
> UTF8, so the conversion is good.
>
> With the Windows server :
> test=3D# SET client_encoding TO 'win1252';
> SET
> test=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar);
> CREATE TABLE
> test=3D# \d test
> ERROR: =C2=A0invalid byte sequence for encoding "UTF8": 0xe3a96e
> test=3D# SELECT attname from pg_attribute where attrelid =3D (select oid =
from
> pg_class where relname =3D 'test');
> ERROR: =C2=A0invalid byte sequence for encoding "UTF8": 0xe3a96e
> test=3D# select version();
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 version
> -------------------------------------------------------------
> =C2=A0PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 32-bit
> (1 row)
>
>
> The main reason that this is a problem is that the table cannot be pg_dum=
ped
> anymore because of this.
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>

Re: BUG #5801: characters not encoded properly for column names

From
Marc Cousin
Date:
No, from a Linux psql client (inside a kde konsole). You can setup any char=
set=20
in it. I have exactly the same behaviour with psql under windows anyway, wi=
th=20
a chcp 1252 in the cmd console. It's not a console charset problem, I've=20
tripled checked that :)

And anyway, this character shouldn't get into the database as UTF8, as it i=
s=20
1252 (hence the error message).



The Thursday 23 December 2010 11:24:21, Pavel Stehule wrote :
> Hello Marc,
>=20
> It was entered from a windows cmd console? It doesn't use win
> encodings as default. For example, it must be executed with parameter
> /c chcp 1250 for win1250 encoding.
>=20
> Regards
>=20
> Pavel Stehule
>=20
> 2010/12/23 Marc Cousin <cousinmarc@gmail.com>:
> > The following bug has been logged online:
> >=20
> > Bug reference:      5801
> > Logged by:          Marc Cousin
> > Email address:      cousinmarc@gmail.com
> > PostgreSQL version: 9.0.2
> > Operating system:   Windows XP
> > Description:        characters not encoded properly for column names
> > Details:
> >=20
> > I get a different behaviour between a Linux and a Windows server, when a
> > user creates an accentuated column name.
> >=20
> > All tests below were done with a linux psql client, the console being s=
et
> > on win1252 charset (so the input character is truly '=C3=A9' in win1252)
> >=20
> > With the Linux server :
> > marc=3D# SET client_encoding TO 'win1252';
> > SET
> > marc=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar);
> > CREATE TABLE
> >=20
> >=20
> > marc=3D# \d test
> >=20
> >=20
> >          Table "public.test"
> >=20
> >=20
> >  Column |       Type        | Modifiers
> >=20
> >=20
> > --------+-------------------+-----------
> >=20
> >=20
> >  nom    | character varying |
> >=20
> >=20
> >  pr=C3=A9nom | character varying |
> >=20
> > 'pr=C3=A9nom' is also displayed correctly if client_encoding and consol=
e are
> > UTF8, so the conversion is good.
> >=20
> > With the Windows server :
> > test=3D# SET client_encoding TO 'win1252';
> > SET
> > test=3D# CREATE TABLE test (nom varchar, pr=C3=A9nom varchar);
> > CREATE TABLE
> > test=3D# \d test
> > ERROR:  invalid byte sequence for encoding "UTF8": 0xe3a96e
> > test=3D# SELECT attname from pg_attribute where attrelid =3D (select oi=
d from
> > pg_class where relname =3D 'test');
> > ERROR:  invalid byte sequence for encoding "UTF8": 0xe3a96e
> > test=3D# select version();
> >                           version
> > -------------------------------------------------------------
> >  PostgreSQL 9.0.2, compiled by Visual C++ build 1500, 32-bit
> > (1 row)
> >=20
> >=20
> > The main reason that this is a problem is that the table cannot be
> > pg_dumped anymore because of this.
> >=20
> > --
> > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-bugs

Re: BUG #5801: characters not encoded properly for column names

From
John R Pierce
Date:
On 12/23/10 2:34 AM, Marc Cousin wrote:
> No, from a Linux psql client (inside a kde konsole). You can setup any charset
> in it. I have exactly the same behaviour with psql under windows anyway, with
> a chcp 1252 in the cmd console. It's not a console charset problem, I've
> tripled checked that :)
>
> And anyway, this character shouldn't get into the database as UTF8, as it is
> 1252 (hence the error message).

does client_encoding affect names ?  (I'm asking because I have no idea).

what encodings are the database clusters on the two platforms?

Re: BUG #5801: characters not encoded properly for column names

From
Marc Cousin
Date:
Le jeudi 23 d=C3=A9cembre 2010 18:21:55, John R Pierce a =C3=A9crit :
> On 12/23/10 2:34 AM, Marc Cousin wrote:
> > No, from a Linux psql client (inside a kde konsole). You can setup any
> > charset in it. I have exactly the same behaviour with psql under windows
> > anyway, with a chcp 1252 in the cmd console. It's not a console charset
> > problem, I've tripled checked that :)
> >=20
> > And anyway, this character shouldn't get into the database as UTF8, as =
it
> > is 1252 (hence the error message).
>=20
> does client_encoding affect names ?  (I'm asking because I have no idea).
Yes (for the Linux server, for Windows it fails).
I have exactly the same problem if I test with LATIN9 (except that the utf8=
=20
error message has a different value for the bad character).

>=20
> what encodings are the database clusters on the two platforms?

Oh. Both are UTF-8.

Re: BUG #5801: characters not encoded properly for column names

From
Robert Haas
Date:
On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com> wrote:
> With the Windows server :
> test=# SET client_encoding TO 'win1252';
> SET

I have a vague recollection that the argument to SET client_encoding
isn't validated on Windows, and if you enter a value that it doesn't
like it simply silently doesn't work.  Am I wrong?  What happens if
you do:

SET client_encoding TO
'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding';

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #5801: characters not encoded properly for column names

From
Marc Cousin
Date:
2010/12/27 Robert Haas <robertmhaas@gmail.com>:
> On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com> wrote:
>> With the Windows server :
>> test=3D# SET client_encoding TO 'win1252';
>> SET
>
> I have a vague recollection that the argument to SET client_encoding
> isn't validated on Windows, and if you enter a value that it doesn't
> like it simply silently doesn't work. =A0Am I wrong? =A0What happens if
> you do:
>
> SET client_encoding TO
> 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding';

Here it is=85

postgres=3D# SET client_encoding TO 'foo';
ERROR:  invalid value for parameter "client_encoding": "foo"

(It does the same with your really long string by the way :) )

Seems validated to me ?

Re: BUG #5801: characters not encoded properly for column names

From
Robert Haas
Date:
On Tue, Dec 28, 2010 at 4:01 AM, Marc Cousin <cousinmarc@gmail.com> wrote:
> 2010/12/27 Robert Haas <robertmhaas@gmail.com>:
>> On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com> wrot=
e:
>>> With the Windows server :
>>> test=3D# SET client_encoding TO 'win1252';
>>> SET
>>
>> I have a vague recollection that the argument to SET client_encoding
>> isn't validated on Windows, and if you enter a value that it doesn't
>> like it simply silently doesn't work. =A0Am I wrong? =A0What happens if
>> you do:
>>
>> SET client_encoding TO
>> 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding';
>
> Here it is=85
>
> postgres=3D# SET client_encoding TO 'foo';
> ERROR: =A0invalid value for parameter "client_encoding": "foo"
>
> (It does the same with your really long string by the way :) )
>
> Seems validated to me ?

Hrm, OK.  Well, you just used up my one guess.  :-(

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BUG #5801: characters not encoded properly for column names

From
Marc Cousin
Date:
The Tuesday 28 December 2010 12:49:20, Robert Haas wrote :
> On Tue, Dec 28, 2010 at 4:01 AM, Marc Cousin <cousinmarc@gmail.com> wrote:
> > 2010/12/27 Robert Haas <robertmhaas@gmail.com>:
> >> On Thu, Dec 23, 2010 at 5:18 AM, Marc Cousin <cousinmarc@gmail.com>=20
wrote:
> >>> With the Windows server :
> >>> test=3D# SET client_encoding TO 'win1252';
> >>> SET
> >>=20
> >> I have a vague recollection that the argument to SET client_encoding
> >> isn't validated on Windows, and if you enter a value that it doesn't
> >> like it simply silently doesn't work.  Am I wrong?  What happens if
> >> you do:
> >>=20
> >> SET client_encoding TO
> >> 'some_really_long_string_that_is_almost_certainly_not_a_valid_encoding=
';
> >=20
> > Here it is=85
> >=20
> > postgres=3D# SET client_encoding TO 'foo';
> > ERROR:  invalid value for parameter "client_encoding": "foo"
> >=20
> > (It does the same with your really long string by the way :) )
> >=20
> > Seems validated to me ?
>=20
> Hrm, OK.  Well, you just used up my one guess.  :-(

Sorry about that. Anyone else wanting to take a guess ? :)