Re: character conversion problem about UTF-8-->SHIFT_JIS_2004 - Mailing list pgsql-general

From Tatsuo Ishii
Subject Re: character conversion problem about UTF-8-->SHIFT_JIS_2004
Date
Msg-id 20080218.094314.38331655.t-ishii@sraoss.co.jp
Whole thread Raw
In response to Re: character conversion problem about UTF-8-->SHIFT_JIS_2004  ("bh yuan" <bhyuan@gmail.com>)
List pgsql-general
> Thanks for your reply.
> I think ther are no error in 7.4.3 but warning.

That means the character in question was ignored in 7.4, i.e. the
character was skipped. I'm not sure that's actually what you want.

> I used the old version 7.4.3 postgresql for 3 years with
> UTF-8 encoding web base frontend.
> Without serious encoding check, user can input not only SJIS character
> but also UTF-8 character freely.
> At the age of 7.4.3,I can export the data as SJIS withou error.
> Such as
> --
> set client_encoding='SJIS';
> select xxx from xxx ...
> --
> But after I update the database from 7.4.3 to 8.3 I occur the error
> --
> ERROR:  character 0xc2a0 of encoding "UTF8" has no equivalent in "SJIS"
> ERROR:  character 0xe29890 of encoding "UTF8" has no equivalent in "SJIS"
> ERROR:  character 0xe998b3 of encoding "UTF8" has no equivalent in "SJIS"
> --
> I had try to modify the conversion map utf8_to_sjis.map ,
> but the user also input some chinese character to the database,
> I had to give up.
> So I crack the
> /postgresql-8.3.0/src/backend/utils/mb/conv.c to avoid the problem.

You are risiking the SQL injection attack by the modification.

> function UtfToLocal  . line 468
>             /*old code
>             if (p == NULL)
>                 report_untranslatable_char(PG_UTF8, encoding,
>                                            (const char *) (utf - l), len);
>             code = p->code;
>             */
>             if (p == NULL) {
>                 //WARNING not ERROR
>                 ereport(WARNING,
>                         (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
>                           errmsg("Ignoring : character 0x%s of encoding \"%s\" has no
> equivalent in \"%s\"",
>                                  utf,
>                                  pg_enc2name_tbl[PG_UTF8].name,
>                                  pg_enc2name_tbl[encoding].name)));
>                 continue;
> --
> I do not know it is right or not even though I can compile it and
> install it correctly.
> Can anybody help me to check the file or any ieda.
>
> 3 source file was attached
> conv.8.3.c -- 8.3 original conv.c file
> conv.7.4.3.c -- 7.4.3 original conv.c file
> conv.c    -- cracked file
>
> Thanks
>
> 2008/2/15, Tatsuo Ishii <ishii@postgresql.org>:
> > I don't see any strange thing.
> >
> > There has been no mapping from UTF-8 0xc2a0 to SJIS in PostgreSQL
> > since the day one. That means, you should get the error on 7.4.3, as
> > well as on 8.3. Are you sure that you don't have the error on 7.4.3?
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> > > SHIFT_JIS_2004  is different to SJIS.
> > > But when I use SJIS, I occur the same problem,
> > > so I try SHIFT_JIS_2004.
> > >
> > > => set client_encoding='SJIS';
> > > SET
> > > => select * from tablexx;
> > > ERROR:  character 0xc2a0 of encoding "UTF8" has no equivalent in "SJIS"
> > >
> > > too confused...
> > >
> > > Thanks
> > >
> > >
> > > 2008/2/13, Tatsuo Ishii <ishii@postgresql.org>:
> > > > > hi
> > > > >
> > > > > I used Postgresql7.4.3 with php for more than 3years.
> > > > > Now I want to change my database to Postgresql8.3.
> > > > > But I occur such problem
> > > > > ----------------------------------------------------------
> > > > > ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in "SJIS"
> > > > > ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in
> > > > > "SHIFT_JIS_2004"
> > > > > ----------------------------------------------------------
> > > > > The database was encoded by UTF-8,
> > > > > to export data as .csv file,
> > > > > I use  set client_encoding='SJIS' at client.
> > > > > When I use Postgresql7.4.3,no problem occur,
> > > > > but after I chaged to Postgresql8.3 ,the error was occured.
> > > > >
> > > > > Can I ignore the error message ?
> > > > > or any othe method to solve this problem.
> > > >
> > > > First of all, you should aware that SHIFT_JIS_2004 is a comppletely
> > > > different beast from SJIS. If you want to continue to use SJIS data in
> > > > 7.4, you must use SJIS, not SHIFT_JIS_2004 on 8.3. Or do you have any
> > > > particular reason to use SHIFT_JIS_2004?
> > > >
> > > > BTW,
> > > >
> > > > > ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in "SJIS"
> > > >
> > > > I don't see this error message with PostgreSQL 8.3.0 running on a
> > > > Linux box. I can store UTF-8 0xe9ab99 (== U+9AD9) and retrieve it from
> > > > the SJIS client side (0xe9ab99 corresponds to 0xfbfc). Actually we can
> > > > confirm this by looking at line 6914 in
> > > > src/backend/utils/mb/Unicode/utf8_to_sjis.map:
> > > >
> > > >  {0xe9ab99, 0xfbfc},
> > > >
> > > > Note that the left is the value for UTF-8, and the right side the
> > > > value for SJIS. I recommend you to double check your PostgreSQL 8.3
> > > > installation.
> > > >
> > > > For your convenience, I have attatched a dump containing a table
> > > > (called "t1") which has the UTF-8 character in question.
> > > >
> > > > $ createdb -E UTF_8 test
> > > > $ gunzip -c /tmp/t1.dump.gz|psql test
> > > > $ psql -c "set client_encoding to SJIS;select * from t1" test
> > > > --
> > > > Tatsuo Ishii
> > > > SRA OSS, Inc. Japan
> > > >
> > > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 6: explain analyze is your friend
> >

pgsql-general by date:

Previous
From: "Alexander Staubo"
Date:
Subject: Re: DB design: How to store object properties?
Next
From: H.Harada
Date:
Subject: Returning large bytea chunk