Thread: Encoding Issue with UNICODE
Hello, I`m using postgresql 7.2.1. According to the following lines data in my database gets encoded as unicode. Server and client communication seems to use unicode as well: woody=# select version(); version --------------------------------------------------------------- PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4 (1 row) woody=# select getdatabaseencoding(); getdatabaseencoding --------------------- UNICODE (1 row) woody=# show client_encoding; NOTICE: Current client encoding is 'UNICODE' SHOW VARIABLE I have a java program, which writes words containing german umlauts like äöü into the database. As you probably know, those characters belong to the ISO-8859-1 character encoding set. In my java webapplication those umlauts (äöü) get displayed correctly. So they actually get stored correctly in the database. However, when I use postgresql's psql client I those characters get displayed incorretly. For example the city name "münchen" gets displayed as "mÃnchen". Not so in my webapplication. There the city name in the HTML code appears corretly as "münchen". So why is psql not displaying the unicode characters correclty? Or could it be that my xterm can not handle unicode characters? But since ü is also LATIN1 (ISO 8859 1) would expect that this should not be a problem. Can somebody help me out here? Should I create the databases as LATIN1 instead of UNICODE? And how can I transform my current databases into LATIN1 ones? They should be compatible, because all characters I use are only äöü, which are downward compatible. Fritz
Fritz Bayer wrote: > I have a java program, which writes words containing german umlauts > like äöü into the database. As you probably know, those characters > belong to the ISO-8859-1 character encoding set. > > In my java webapplication those umlauts (äöü) get displayed correctly. > So they actually get stored correctly in the database. > > However, when I use postgresql's psql client I those characters get > displayed incorretly. > > For example the city name "münchen" gets displayed as "mÃnchen". Not > so in my webapplication. There the city name in the HTML code appears > corretly as "münchen". > > So why is psql not displaying the unicode characters correclty? Or > could it be that my xterm can not handle unicode characters? From your description it really looks like the latter. You can issue \encoding latin1 inside psql or you can also set the PGCLIENTENCODING environment variable to latin1 before launching psql on non-unicode aware terminals. > Can somebody help me out here? Should I create the databases as LATIN1 > instead of UNICODE? And how can I transform my current databases into > LATIN1 ones? They should be compatible, because all characters I use > are only äöü, which are downward compatible. But then you'll have trouble with your java app if you do that. Java works with unicode strings, so it makes sense to have the db contents in unicode as well. -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Fritz Bayer wrote: > Hello, > > I`m using postgresql 7.2.1. According to the following lines data in > my database gets encoded as unicode. Server and client communication > seems to use unicode as well: > > woody=# select version(); > version > --------------------------------------------------------------- > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4 > (1 row) > > woody=# select getdatabaseencoding(); > getdatabaseencoding > --------------------- > UNICODE > (1 row) > > woody=# show client_encoding; > NOTICE: Current client encoding is 'UNICODE' > SHOW VARIABLE > > I have a java program, which writes words containing german umlauts > like äöü into the database. As you probably know, those characters > belong to the ISO-8859-1 character encoding set. > > In my java webapplication those umlauts (äöü) get displayed correctly. > So they actually get stored correctly in the database. > I know I had to set the charSet option in the connection URL to get stuff working once: "jdbc:postgresql://server/database?charSet=LATIN1" Maybe that would work for UNICODE? Regards, Magnus
mag@fbab.net ("Magnus Naeslund(t)") wrote in message news:<425AAE6D.6080008@fbab.net>... > Fritz Bayer wrote: > > Hello, > > > > I`m using postgresql 7.2.1. According to the following lines data in > > my database gets encoded as unicode. Server and client communication > > seems to use unicode as well: > > > > woody=# select version(); > > version > > --------------------------------------------------------------- > > PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4 > > (1 row) > > > > woody=# select getdatabaseencoding(); > > getdatabaseencoding > > --------------------- > > UNICODE > > (1 row) > > > > woody=# show client_encoding; > > NOTICE: Current client encoding is 'UNICODE' > > SHOW VARIABLE > > > > I have a java program, which writes words containing german umlauts > > like äöü into the database. As you probably know, those characters > > belong to the ISO-8859-1 character encoding set. > > > > In my java webapplication those umlauts (äöü) get displayed correctly. > > So they actually get stored correctly in the database. > > > > I know I had to set the charSet option in the connection URL to get > stuff working once: > > "jdbc:postgresql://server/database?charSet=LATIN1" > > Maybe that would work for UNICODE? > As far I have heard the charSet property is ignored by the jdbc drivers. However, somebody patched them an introduced this property. > Regards, > Magnus > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
daniel@manitou-mail.org ("Daniel Verite") wrote in message news:<20050411035003.3592776@localhost>... > Fritz Bayer wrote: > > > I have a java program, which writes words containing german umlauts > > like äöü into the database. As you probably know, those characters > > belong to the ISO-8859-1 character encoding set. > > > > In my java webapplication those umlauts (äöü) get displayed correctly. > > So they actually get stored correctly in the database. > > > > However, when I use postgresql's psql client I those characters get > > displayed incorretly. > > > > For example the city name "münchen" gets displayed as "mÃ?nchen". Not > > so in my webapplication. There the city name in the HTML code appears > > corretly as "münchen". > > > > So why is psql not displaying the unicode characters correclty? Or > > could it be that my xterm can not handle unicode characters? > > From your description it really looks like the latter. You can issue > \encoding latin1 > inside psql > Thanks for you help. Now I undestand. It's true somehow my terminal does not handle unicode characters. After I entered "\encoding latin1" as you suggested everything works fine. So the answer is that without that unicode characters get displayed. But in which encoding? I guess utf8 or utf16... But why doesn that fail only for äüö? Shouldn't any other letter encoded in utf16 also fail? I mean unicode itself is 16 bit long. So "münchen" should expand to 14 characters. But only ü expands to two characters. > or you can also set the PGCLIENTENCODING environment variable to latin1 > before launching psql on non-unicode aware terminals. > > > Can somebody help me out here? Should I create the databases as LATIN1 > > instead of UNICODE? And how can I transform my current databases into > > LATIN1 ones? They should be compatible, because all characters I use > > are only äöü, which are downward compatible. > > But then you'll have trouble with your java app if you do that. Java works with > unicode strings, so it makes sense to have the db contents in unicode as well. No thats ok. Java communicates with psql using unicode only. That's why it also worked...
On Apr 12, 2005, at 6:39 AM, Fritz Bayer wrote: > But in which encoding? I guess utf8 or utf16... > > But why doesn that fail only for äüö? Shouldn't any other letter > encoded in utf16 also fail? > > I mean unicode itself is 16 bit long. So "münchen" should expand to 14 > characters. But only ü expands to two characters. PostgreSQL only supports utf-8. There has been discussion of using a label other than "unicode" to make this more apparent. John DeSoi, Ph.D. http://pgedit.com/ Power Tools for PostgreSQL
On Tue, Apr 12, 2005 at 03:39:45AM -0700, Fritz Bayer <fritz-bayer@web.de> wrote a message of 53 lines which said: > I mean unicode itself is 16 bit long. This is completely false. Unicode itself is just a table and, since it contains more than 100,000 characters, you cannot index them with 16 bits. Unicode has various encodings, some fixed-size, like UTF-32, some not. > So "münchen" should expand to 14 characters. But only ü expands to > two characters. Perfectly normal with UTF-8, where the size of an Unicode charactere is not fixed.