Thread: Strange UTF-8 behaviour

Strange UTF-8 behaviour

From
"Marco Ferretti"
Date:
<small><font face="Century Gothic">Hi there all. <br /> I am quite new to Postgres, so forgive me if this question
seemsobvious. <br /><br /> I have created a database with the UTF-8 encoding  (createdb cassa --encoding=UTF-8) .<br />
ThenI have made the following tests :<br /><br /></font></small><small><font face="Century Gothic">cassa=>
</font></small><small><fontface="Century Gothic">create table test(id varchar(5));<br /> cassa=> insert into test
values('12345');<br /> INSERT 178725 1<br /> cassa=> insert into test values ('123è');<br /> INSERT 178726 1<br />
cassa=>insert into test values ('1234è');<br /> ERROR:  value too long for type character varying(5)<br /><br /><br
/>but if I try <br /> cassa=> select '#' || id || '#' from test;<br />  ?column?<br /> ----------<br />  #12345#<br
/> #123è#<br /> (2 rows)<br /><br /><br /> so, apparently the chars are stored the rigth way
(</font></small><small><fontface="Century Gothic"> #123è#) but when trying the query the è char is parsed as  2 chars
....<br/><br /> The database server version is 7.3.4 on a RedHat 9 machine ...<br /><br /> Any clue ?<br /><br /> Tia
<br/>     Marco<br /></font></small><small><font face="Century Gothic"><br /><br /></font></small> <pre
class="moz-signature"cols="72">-- 
 
Ever noticed how fast windows run ? neither did I 

</pre>

Re: Strange UTF-8 behaviour

From
Dennis Gearon
Date:
My guess is that something in the chain of getting the data into the
database is measuring:

    BYTES
not
    CHARACTERS.

"Marco Ferretti" <marco.ferretti@jrc.it> wrote:
</quote--------------------------------------->
<snip>
I have created a database with the UTF-8 encoding  (createdb cassa
--encoding=UTF-8) .
Then I have made the following tests :

cassa=> create table test(id varchar(5));
cassa=> insert into test values ('12345');
INSERT 178725 1
cassa=> insert into test values ('123è');
INSERT 178726 1
cassa=> insert into test values ('1234è');
ERROR:  value too long for type character varying(5)
<snip>
so, apparently the chars are stored the rigth way ( #123è#) but when
trying the query the è char is parsed as  2 chars ....

The database server version is 7.3.4 on a RedHat 9 machine ...

Any clue ?
</quote--------------------------------------->

Re: Strange UTF-8 behaviour

From
Alvaro Herrera
Date:
On Thu, Sep 16, 2004 at 06:10:13PM +0200, Marco Ferretti wrote:

> I am quite new to Postgres, so forgive me if this question seems
> obvious. <br>
> <br>
> I have created a database with the UTF-8 encoding  (createdb cassa
> --encoding=UTF-8) .<br>
> Then I have made the following tests :<br>

FWIW, I can't reproduce this using 7.3.6.  Is there anything special
about your 'e' character, or it's a plain 'e'?

$ createdb test --encoding=UTF-8
CREATE DATABASE
COMMENT

$ psql test
Welcome to psql 7.3.6, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help on internal slash commands
       \g or terminate with semicolon to execute query
       \q to quit

test=#  create table test (id char(5));
CREATE TABLE
test=# insert into test values ('1234e');
INSERT 16993 1
test=# create table test2 (id varchar(5));
CREATE TABLE
test=# insert into test2 values ('1234e');
INSERT 16996 1
test=# insert into test2 values ('123e');
INSERT 16997 1
test=# select '#' || id || '#', length(id) from test2;
 ?column? | length
----------+--------
 #1234e#  |      5
 #123e#   |      4
(2 rows)

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)


Re: Strange UTF-8 behaviour

From
Matteo Beccati
Date:
Hi Alvaro,

> FWIW, I can't reproduce this using 7.3.6.  Is there anything special
> about your 'e' character, or it's a plain 'e'?

Maybe you didn't get the email correctly. It was an e with grave
accent:, just like this:

è (UTF-8 encoded)

I just checked on PG 7.4.3 / NetBSD, with this results:

egrave=# CREATE TABLE test (data varchar(5));
CREATE
egrave=# show server_encoding ;
  client_encoding
-----------------
  UNICODE
(1 row)

egrave=# show client_encoding ; -- don't know why it is set to unicode
  client_encoding
-----------------
  UNICODE
(1 row)

egrave=# INSERT INTO test VALUES ('1234è');
egrave'# '\r
Query buffer reset (cleared).
egrave=# set client_encoding = 'ISO8859-1';
SET
egrave=# show client_encoding ;
  client_encoding
-----------------
  ISO8859-1
(1 row)

egrave=# INSERT INTO test VALUES ('1234è');
INSERT 25340 1
egrave=# SELECT * FROM test;
  data
------
  1234è
(1 row)


It seems all is working when client encoding is set correctly up. Try to
check you client and server encoding.

I've also double checked with:

egrave=# SET client_encoding = 'ISO8859-2';
SET
egrave=# SELECT * FROM test;
WARNING:  ignoring unconvertible UTF-8 character 0xc3a8
  data
------
  1234
(1 row)


Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

Re: Strange UTF-8 behaviour

From
Matteo Beccati
Date:
Hi,

> è (UTF-8 encoded)

Sorry, I actually forgot to switch encoding :)

I just hope the last part of the email was readable.


Ciao ciao
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

Re: Strange UTF-8 behaviour

From
"Marco Ferretti"
Date:
Thanks to all you guys ! You really helped

marco