Thread: Problem with utf8 encoding
Hello everyone!
I'm working in a project using postgres, propel, and php.
My development environment is:
SO: Windows vista Business SP2
Postgres: Postgres v8.4
Propel: Propel generator/runtime v1.4
PHP: PHP v5.3
Currently I'm struggling with a problem caused by the encoding. Everytime I try to insert a row into the table CURRENCY, which has ID, DESC, and SYMBOL as its columns, I get the following error:
Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]: Character not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0x80 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".]
I've created the database using this sentence:
CREATE DATABASE sbs
WITH OWNER = sbsadmin
ENCODING = 'UTF8'
LC_COLLATE = 'Spanish_Peru.1252'
LC_CTYPE = 'Spanish_Peru.1252'
CONNECTION LIMIT = -1;
When I run the following commands in psql I get this:
sbs=#SHOW SERVER_ENCODING;
server_encoding
---------------
UTF8
(1 fila)
sbs=#SHOW CLIENT_ENCODING;
client_encoding
---------------
UTF8
(1 fila)
The generated file by propel that manages the database connection sets the charset to utf8 too. The insertion fails only everytime I try to insert the currency symbols such as €, ¥, £, ₱, etc. I have no problem with symbols such S/., $, p., Q, L or any other non-special symbol. The curious thing is that when I run the sentence INSERT INTO CURRENCY VALUES(3,'EURO','€'); using the query tool of pgAdminIII, I got no error and the query was sucessfully executed. However running the same sentence in psql, I got the row inserted but with ? instead of € as the currency symbol.
I think the utf8 is the apropiate charset to represent these symbols but I have no idea why I'm getting this problem. I've spent the whole day struggling with this problem and I can't figure it out yet. I have tried many things, none of them worked. So this is why I'm here asking for your help, as my last resource. Any guidance that could put me in the right direction to address this problem will be really appreciated!
Thanks in advance and I'm sorry If I went on a bit long...
--
Jorge Miranda Castañeda.
Simbiosys Software S.A.C.
On Thu, 2009-12-03 at 02:00 -0500, Jorge Miranda Castañeda wrote: > Hello everyone! > > > I'm working in a project using postgres, propel, and php. > > > My development environment is: > SO: Windows vista Business SP2 > Postgres: Postgres v8.4 > Propel: Propel generator/runtime v1.4 > PHP: PHP v5.3 > > > Currently I'm struggling with a problem caused by the encoding. > Everytime I try to insert a row into the table CURRENCY, which has ID, > DESC, and SYMBOL as its columns, I get the following error: > Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]: > Character not in repertoire: 7 ERROR: invalid byte sequence for > encoding "UTF8": 0x80 HINT: This error can also happen if the byte > sequence does not match the encoding expected by the server, which is > controlled by "client_encoding".] > > > I've created the database using this sentence: > CREATE DATABASE sbs > WITH OWNER = sbsadmin > ENCODING = 'UTF8' > LC_COLLATE = 'Spanish_Peru.1252' > LC_CTYPE = 'Spanish_Peru.1252' > CONNECTION LIMIT = -1; Hola Jorge, I suspect it's the LC_COLLATE and LC_CTYPE that you have there. I don't *know* this, but they *look* like they are some weird sort of collation/ctype based on the misguided Windows-1252 encoding. Sadly, Windows provides data in this encoding into web forms where the accept charset is supposedly only ISO-8859. In Windows-1252 the Euro currency symbol is somewhere in the 0x80 - 0x9f range - possibly it is 0x80, in fact. I think you would be better to use a consistent locale like es_PE.UTF-8 though if your data is 1252 encoded then you might need to iconv it first. If you have data which is a mix of ISO-8859-1, Windows-1252 and UTF-8 then I can point you at a wee bit of PHP code I wrote which will look at each character in a string and only iconv from 8859/1252 -> UTF-8 if it is a high-bit byte which is not part of a valid UTF-8 character already. The code is here: http://repo.or.cz/w/awl.git/blob/HEAD:/inc/AWLUtilities.php You need both of the last two functions - call the first one during initialisation, and use the second one to clean the strings. Cheers, Andrew McMillan. ------------------------------------------------------------------------ andrew (AT) morphoss (DOT) com +64(272)DEBIAN You will feel hungry again in another hour. ------------------------------------------------------------------------
Attachment
Hi Jorge, I already encountered this problem while programming PHP web application in French. The problem is with CLIENT_ENCODING variable of your database. It must fit with the encoding of your web page. Forget for the moment to ask query inside pgAdminIII or a Windows shell using psql (it's different ways to access your Postgresql database). You talked about propel. Then I suppose you use Symphony framework. Look at the encoding of the web page where you enter your data in your browser. Usually, browser is set to automaticaly detect web page encoding. I assume you have Internet Explorer. Check in the menu View/Encoding... You should see an encoding name with a large dot in front. It's the one used to show the web page. It's the same encoding that Symphony (or PHP) uses to send your data to the database. An other way to better know the encoding of the web page is to show source code (View/Source on IE). If you can see near the first lines something who looks like <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> In this example, iso-8859-1 is your web page encoding. This means you need to set Postgresql client encoding variable to "latin1". You can find here a charset table converter for Postgresql: http://www.postgresql.org/docs/8.1/static/multibyte.html Once you found your client encoding, you need to set this parameter in your database. Here, it's important to understand difference between server and client encoding. In your example, your database is set to UTF-8 (what you gave as encoding parameter while you created your database). It's your SERVER encoding. It's mean that all your data are store in your database in UTF-8 encoding, with collate and ctype in "Spanish_Peru.1252". > /CREATE DATABASE sbs/ > / WITH OWNER = sbsadmin/ > / ENCODING = 'UTF8'/ > / LC_COLLATE = 'Spanish_Peru.1252'/ > / LC_CTYPE = 'Spanish_Peru.1252'/ Postgresql comes with the iconv librairy and can convert your data in almost any other charset depending of your CLIENT encoding. Now, I talk about your PHP application (web page). You can set your client encoding for a while with "SET NAMES 'latin1'" at the begin of each time you send a set of queries to your database, i.e. each time you refresh or change your web page. But this way is incomfortable if you not control your SQL query input. You can set it once in your postgresql.conf file. Set your database 'client_encoding' variable to the encoding of your favorite PHP application. Don't forget to restart your Postgresql Windows service once you changed it. As mentionned in the documentation, you can also change your client encoding default setting by changing PGCLIENTENCODING environment variable. This can be set by right clicking on "Your computer" icon on your windows desktop, click on Properties, click on Advanced System Parameters to your left menu, then click on Advanced System Parameters tab and click on Environment Variables button. You have choice to set your variable just inside your windows account (user variables) or for all the machine (System variables). Then click on New button and add this variable with the wished value. It's important to note that this methode must be done on the computer where is put your Postgresql database server. If your database is not on your computer, use another method to change your client encoding setting. This should resolve your problem. About use of psql through a shell, you must know that Windows shell encoding is 'cp850'. It's a old charset used with the DOS system who keep Windows some decade behind... Up to this moment, I didn't find a Postgresql encoding where you can enter and send your particular symbol... Another Microsoft problem... Best regards Sylvain Racine Jorge Miranda Castañeda wrote: > Hello everyone! > > I'm working in a project using postgres, propel, and php. > > My development environment is: > SO: Windows vista Business SP2 > Postgres: Postgres v8.4 > Propel: Propel generator/runtime v1.4 > PHP: PHP v5.3 > > Currently I'm struggling with a problem caused by the encoding. Everytime I try to insert a row into the table CURRENCY, which has ID, DESC, and SYMBOL as its columns, I get the following error: > Unable to execute INSERT statement. [wrapped: SQLSTATE[22021]: Character not in repertoire: 7 ERROR: invalid byte sequence for encoding "UTF8": 0x80 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".] > > I've created the database using this sentence: > /CREATE DATABASE sbs/ > / WITH OWNER = sbsadmin/ > / ENCODING = 'UTF8'/ > / LC_COLLATE = 'Spanish_Peru.1252'/ > / LC_CTYPE = 'Spanish_Peru.1252'/ > / CONNECTION LIMIT = -1;/ > > When I run the following commands in psql I get this: > /sbs=#SHOW SERVER_ENCODING;/ > /server_encoding/ > /---------------/ > /UTF8/ > /(1 fila)/ > > /sbs=#SHOW CLIENT_ENCODING;/ > /client_encoding/ > /---------------/ > /UTF8/ > /(1 fila)/ > > The generated file by propel that manages the database connection sets the charset to utf8 too. The insertion fails only everytime I try to insert the currency symbols such as €, ¥, £, ₱, etc. I have no problem with symbols such S/., $, p., Q, L or any other non-special symbol. The curious thing is that when I run the sentence INSERT INTO CURRENCY VALUES(3,'EURO','€'); using the query tool of pgAdminIII, I got no error and the query was sucessfully executed. However running the same sentence in psql, I got the row inserted but with ? instead of € as the currency symbol. > I think the utf8 is the apropiate charset to represent these symbols but I have no idea why I'm getting this problem. I've spent the whole day struggling with this problem and I can't figure it out yet. I have tried many things, none of them worked. So this is why I'm here asking for your help, as my last resource. Any guidance that could put me in the right direction to address this problem will be really appreciated! > > Thanks in advance and I'm sorry If I went on a bit long... > > -- > Jorge Miranda Castañeda. > Simbiosys Software S.A.C. > www.simbiosysonline.com <http://www.simbiosysonline.com> >