Thread: Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding
Steve Haslam (araqnid@debian.org) reports a bug with a severity of 2 The lower the number the more severe it is. Short Description Unable to use LATIN9 (=ISO-8859-15) encoding Long Description I am trying to use LATIN9 (ISO-8859-15) as my client encoding rather than LATIN1-- the database I am using is encoded asUNICODE. However, if I attempt to use the LATIN9 encoding, I get erroneous results. I am using PostgreSQL 7.3 from Debianunstable (7.3rel-3), which gives a version() string of "PostgreSQL 7.3 on i386-pc-linux-gnu, compiled by GCC 2.95.4". From PSQL, if I perform: \encoding LATIN9 insert into i18ntest(id, data) values('Euro symbol', '¤'); then I would expect this to insert a euro symbol into the data column (code point 164 is Euro in ISO-8859-15). However, whenI change back to UTF-8, the UTF-8 data is "¤", which is the currency symbol. Now, if I try to insert the Euro symbol using a UNICODE client encoding, then I get an error when I switch back to LATIN9and SELECT it out again: psql:/home/steve/public_html/i18ntest.sql:35: WARNING: UtfToLocal: could not convert UTF-8 (0xe282ac). Ignored However, if I switch to LATIN1 and try to SELECT it, I get a conversion error, which is correct since the euro symbol doesnot have a code point in LATIN1: psql:/home/steve/public_html/i18ntest.sql:33: ERROR: Could not convert UTF-8 to ISO8859-1 Sample Code -- this code is available at http://araqnid.ddts.net/~steve/i18ntest.sql in case it gets munged by the form/browser/server -- This is done in a database with "UNICODE" encoding -- e.g.: -- create database i18ntest encoding = 'UNICODE'; -- \connect i18ntest select version(); drop table i18ntest; create table i18ntest(id text primary key, data text not null); begin; \encoding LATIN1 insert into i18ntest(id, data) values('Pound sign', '£'); \encoding LATIN9 insert into i18ntest(id, data) values('Euro symbol', '¤'); commit; \encoding UNICODE select id, data from i18ntest; \encoding LATIN1 select id, data from i18ntest; \encoding LATIN9 select id, data from i18ntest; begin; \encoding UNICODE update i18ntest set data = '£' where id = 'Pound sign'; update i18ntest set data = 'â\202¬' where id = 'Euro symbol'; commit; \encoding UNICODE select id, data from i18ntest; \encoding LATIN1 select id, data from i18ntest; \encoding LATIN9 select id, data from i18ntest; -- drop table i18ntest; -- drop database i18ntest; No file was uploaded with this report
> >From PSQL, if I perform: > \encoding LATIN9 > insert into i18ntest(id, data) values('Euro symbol', '¤'); > > then I would expect this to insert a euro symbol into the data column '¤' means '¤', not anything else. Maybe you want to try '\244' (octal). -- Peter Eisentraut peter_e@gmx.net
> >From PSQL, if I perform: > \encoding LATIN9 > insert into i18ntest(id, data) values('Euro symbol', '¤'); > > then I would expect this to insert a euro symbol into the data column > (code point 164 is Euro in ISO-8859-15). However, when I change back to > UTF-8, the UTF-8 data is "¤", which is the currency symbol. I have confirmed this. It appears to have been a copy and paste mistake. I will put the following patch into the next subrelease (7.3.1): *** ../pg73branch/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c Tue Oct 29 18:19:192002 --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c Mon Dec 9 20:14:43 2002 *************** *** 98,104 **** {PG_LATIN8, LUmapISO8859_14, ULmapISO8859_14, sizeof(LUmapISO8859_14) / sizeof(pg_local_to_utf), sizeof(ULmapISO8859_14) / sizeof(pg_utf_to_local)}, /* ISO-8859-14 Latin 8 */ ! {PG_LATIN9, LUmapISO8859_2, ULmapISO8859_2, sizeof(LUmapISO8859_15) / sizeof(pg_local_to_utf), sizeof(ULmapISO8859_15) / sizeof(pg_utf_to_local)}, /* ISO-8859-15 Latin 9 */ {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16, --- 98,104 ---- {PG_LATIN8, LUmapISO8859_14, ULmapISO8859_14, sizeof(LUmapISO8859_14) / sizeof(pg_local_to_utf), sizeof(ULmapISO8859_14) / sizeof(pg_utf_to_local)}, /* ISO-8859-14 Latin 8 */ ! {PG_LATIN9, LUmapISO8859_15, ULmapISO8859_15, sizeof(LUmapISO8859_15) / sizeof(pg_local_to_utf), sizeof(ULmapISO8859_15) / sizeof(pg_utf_to_local)}, /* ISO-8859-15 Latin 9 */ {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16, -- Peter Eisentraut peter_e@gmx.net
On Fri, Dec 06, 2002 at 12:20:54AM +0100, Peter Eisentraut wrote: > > >From PSQL, if I perform: > > \encoding LATIN9 > > insert into i18ntest(id, data) values('Euro symbol', '¤'); > > > > then I would expect this to insert a euro symbol into the data column >=20 > '¤' means '¤', not anything else. Maybe you want to try '\244' > (octal). That was a literal character 164 that the browser seems to have munged when uploading the form (the script is also available in raw form using the URI at the top, http://araqnid.ddts.net/~steve/i18ntest.sql) SRH --=20 Steve Haslam Reading, UK araqnid@innocent.com Debian GNU/Linux Maintainer araqnid@debian.org Currently for sale: http://www.arise.demon.co.uk/my_cv/ almost called it today, turned to face the void, numb with the suffering and the question- "Why am I?" [queensr=FFc= he]