Thread: Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding

Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding

From
pgsql-bugs@postgresql.org
Date:
Steve Haslam (araqnid@debian.org) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
Unable to use LATIN9 (=ISO-8859-15) encoding

Long Description
I am trying to use LATIN9 (ISO-8859-15) as my client encoding rather than LATIN1-- the database I am using is encoded
asUNICODE. However, if I attempt to use the LATIN9 encoding, I get erroneous results. I am using PostgreSQL 7.3 from
Debianunstable (7.3rel-3), which gives a version() string of "PostgreSQL 7.3 on i386-pc-linux-gnu, compiled by GCC
2.95.4".

From PSQL, if I perform:
\encoding LATIN9
insert into i18ntest(id, data) values('Euro symbol', '¤');

then I would expect this to insert a euro symbol into the data column (code point 164 is Euro in ISO-8859-15). However,
whenI change back to UTF-8, the UTF-8 data is "¤", which is the currency symbol. 

Now, if I try to insert the Euro symbol using a UNICODE client encoding, then I get an error when I switch back to
LATIN9and SELECT it out again: 

psql:/home/steve/public_html/i18ntest.sql:35: WARNING:  UtfToLocal: could not convert UTF-8 (0xe282ac). Ignored

However, if I switch to LATIN1 and try to SELECT it, I get a conversion error, which is correct since the euro symbol
doesnot have a code point in LATIN1: 
psql:/home/steve/public_html/i18ntest.sql:33: ERROR:  Could not convert UTF-8 to ISO8859-1


Sample Code
-- this code is available at http://araqnid.ddts.net/~steve/i18ntest.sql in case it gets munged by the
form/browser/server
-- This is done in a database with "UNICODE" encoding
-- e.g.:
--  create database i18ntest encoding = 'UNICODE';
--  \connect i18ntest

select version();

drop table i18ntest;
create table i18ntest(id text primary key, data text not null);
begin;
\encoding LATIN1
insert into i18ntest(id, data) values('Pound sign', '£');
\encoding LATIN9
insert into i18ntest(id, data) values('Euro symbol', '¤');
commit;

\encoding UNICODE
select id, data from i18ntest;
\encoding LATIN1
select id, data from i18ntest;
\encoding LATIN9
select id, data from i18ntest;

begin;
\encoding UNICODE
update i18ntest set data = '£' where id = 'Pound sign';
update i18ntest set data = 'â\202¬' where id = 'Euro symbol';
commit;

\encoding UNICODE
select id, data from i18ntest;
\encoding LATIN1
select id, data from i18ntest;
\encoding LATIN9
select id, data from i18ntest;

-- drop table i18ntest;
-- drop database i18ntest;


No file was uploaded with this report

Re: Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding

From
Peter Eisentraut
Date:
> >From PSQL, if I perform:
> \encoding LATIN9
> insert into i18ntest(id, data) values('Euro symbol', '¤');
>
> then I would expect this to insert a euro symbol into the data column

'¤' means '¤', not anything else.  Maybe you want to try '\244'
(octal).

--
Peter Eisentraut   peter_e@gmx.net

Re: Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding

From
Peter Eisentraut
Date:
> >From PSQL, if I perform:
> \encoding LATIN9
> insert into i18ntest(id, data) values('Euro symbol', '¤');
>
> then I would expect this to insert a euro symbol into the data column
> (code point 164 is Euro in ISO-8859-15). However, when I change back to
> UTF-8, the UTF-8 data is "¤", which is the currency symbol.

I have confirmed this.  It appears to have been a copy and paste mistake.
I will put the following patch into the next subrelease (7.3.1):

*** ../pg73branch/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c    Tue Oct 29
18:19:192002 
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c    Mon Dec  9 20:14:43 2002
***************
*** 98,104 ****
      {PG_LATIN8, LUmapISO8859_14, ULmapISO8859_14,
          sizeof(LUmapISO8859_14) / sizeof(pg_local_to_utf),
      sizeof(ULmapISO8859_14) / sizeof(pg_utf_to_local)}, /* ISO-8859-14 Latin 8 */
!     {PG_LATIN9, LUmapISO8859_2, ULmapISO8859_2,
          sizeof(LUmapISO8859_15) / sizeof(pg_local_to_utf),
      sizeof(ULmapISO8859_15) / sizeof(pg_utf_to_local)}, /* ISO-8859-15 Latin 9 */
      {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
--- 98,104 ----
      {PG_LATIN8, LUmapISO8859_14, ULmapISO8859_14,
          sizeof(LUmapISO8859_14) / sizeof(pg_local_to_utf),
      sizeof(ULmapISO8859_14) / sizeof(pg_utf_to_local)}, /* ISO-8859-14 Latin 8 */
!     {PG_LATIN9, LUmapISO8859_15, ULmapISO8859_15,
          sizeof(LUmapISO8859_15) / sizeof(pg_local_to_utf),
      sizeof(ULmapISO8859_15) / sizeof(pg_utf_to_local)}, /* ISO-8859-15 Latin 9 */
      {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,

--
Peter Eisentraut   peter_e@gmx.net

Re: Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding

From
Steve Haslam
Date:
On Fri, Dec 06, 2002 at 12:20:54AM +0100, Peter Eisentraut wrote:
> > >From PSQL, if I perform:
> > \encoding LATIN9
> > insert into i18ntest(id, data) values('Euro symbol', '¤');
> >
> > then I would expect this to insert a euro symbol into the data column
>=20
> '¤' means '¤', not anything else.  Maybe you want to try '\244'
> (octal).

That was a literal character 164 that the browser seems to have munged when
uploading the form (the script is also available in raw form using the URI
at the top, http://araqnid.ddts.net/~steve/i18ntest.sql)

SRH
--=20
Steve Haslam      Reading, UK                           araqnid@innocent.com
Debian GNU/Linux Maintainer                               araqnid@debian.org
                     Currently for sale: http://www.arise.demon.co.uk/my_cv/
almost called it today, turned to face the void, numb with the suffering
and the question- "Why am I?"                                  [queensr=FFc=
he]