Thread: Strange UTF-8 behaviour
<small><font face="Century Gothic">Hi there all. <br /> I am quite new to Postgres, so forgive me if this question seemsobvious. <br /><br /> I have created a database with the UTF-8 encoding (createdb cassa --encoding=UTF-8) .<br /> ThenI have made the following tests :<br /><br /></font></small><small><font face="Century Gothic">cassa=> </font></small><small><fontface="Century Gothic">create table test(id varchar(5));<br /> cassa=> insert into test values('12345');<br /> INSERT 178725 1<br /> cassa=> insert into test values ('123è');<br /> INSERT 178726 1<br /> cassa=>insert into test values ('1234è');<br /> ERROR: value too long for type character varying(5)<br /><br /><br />but if I try <br /> cassa=> select '#' || id || '#' from test;<br /> ?column?<br /> ----------<br /> #12345#<br /> #123è#<br /> (2 rows)<br /><br /><br /> so, apparently the chars are stored the rigth way (</font></small><small><fontface="Century Gothic"> #123è#) but when trying the query the è char is parsed as 2 chars ....<br/><br /> The database server version is 7.3.4 on a RedHat 9 machine ...<br /><br /> Any clue ?<br /><br /> Tia <br/> Marco<br /></font></small><small><font face="Century Gothic"><br /><br /></font></small> <pre class="moz-signature"cols="72">-- Ever noticed how fast windows run ? neither did I </pre>
My guess is that something in the chain of getting the data into the database is measuring: BYTES not CHARACTERS. "Marco Ferretti" <marco.ferretti@jrc.it> wrote: </quote---------------------------------------> <snip> I have created a database with the UTF-8 encoding (createdb cassa --encoding=UTF-8) . Then I have made the following tests : cassa=> create table test(id varchar(5)); cassa=> insert into test values ('12345'); INSERT 178725 1 cassa=> insert into test values ('123è'); INSERT 178726 1 cassa=> insert into test values ('1234è'); ERROR: value too long for type character varying(5) <snip> so, apparently the chars are stored the rigth way ( #123è#) but when trying the query the è char is parsed as 2 chars .... The database server version is 7.3.4 on a RedHat 9 machine ... Any clue ? </quote--------------------------------------->
On Thu, Sep 16, 2004 at 06:10:13PM +0200, Marco Ferretti wrote: > I am quite new to Postgres, so forgive me if this question seems > obvious. <br> > <br> > I have created a database with the UTF-8 encoding (createdb cassa > --encoding=UTF-8) .<br> > Then I have made the following tests :<br> FWIW, I can't reproduce this using 7.3.6. Is there anything special about your 'e' character, or it's a plain 'e'? $ createdb test --encoding=UTF-8 CREATE DATABASE COMMENT $ psql test Welcome to psql 7.3.6, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit test=# create table test (id char(5)); CREATE TABLE test=# insert into test values ('1234e'); INSERT 16993 1 test=# create table test2 (id varchar(5)); CREATE TABLE test=# insert into test2 values ('1234e'); INSERT 16996 1 test=# insert into test2 values ('123e'); INSERT 16997 1 test=# select '#' || id || '#', length(id) from test2; ?column? | length ----------+-------- #1234e# | 5 #123e# | 4 (2 rows) -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
Hi Alvaro, > FWIW, I can't reproduce this using 7.3.6. Is there anything special > about your 'e' character, or it's a plain 'e'? Maybe you didn't get the email correctly. It was an e with grave accent:, just like this: è (UTF-8 encoded) I just checked on PG 7.4.3 / NetBSD, with this results: egrave=# CREATE TABLE test (data varchar(5)); CREATE egrave=# show server_encoding ; client_encoding ----------------- UNICODE (1 row) egrave=# show client_encoding ; -- don't know why it is set to unicode client_encoding ----------------- UNICODE (1 row) egrave=# INSERT INTO test VALUES ('1234è'); egrave'# '\r Query buffer reset (cleared). egrave=# set client_encoding = 'ISO8859-1'; SET egrave=# show client_encoding ; client_encoding ----------------- ISO8859-1 (1 row) egrave=# INSERT INTO test VALUES ('1234è'); INSERT 25340 1 egrave=# SELECT * FROM test; data ------ 1234è (1 row) It seems all is working when client encoding is set correctly up. Try to check you client and server encoding. I've also double checked with: egrave=# SET client_encoding = 'ISO8859-2'; SET egrave=# SELECT * FROM test; WARNING: ignoring unconvertible UTF-8 character 0xc3a8 data ------ 1234 (1 row) Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/
Hi, > è (UTF-8 encoded) Sorry, I actually forgot to switch encoding :) I just hope the last part of the email was readable. Ciao ciao -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/
Thanks to all you guys ! You really helped marco