Re: character conversion problem about UTF-8-->SHIFT_JIS_2004 - Mailing list pgsql-general

From Tatsuo Ishii
Subject Re: character conversion problem about UTF-8-->SHIFT_JIS_2004
Date
Msg-id 20080213.234805.84360360.t-ishii@sraoss.co.jp
Whole thread Raw
In response to character conversion problem about UTF-8-->SHIFT_JIS_2004  ("bh yuan" <bhyuan@gmail.com>)
Responses Re: character conversion problem about UTF-8-->SHIFT_JIS_2004
List pgsql-general
> hi
>
> I used Postgresql7.4.3 with php for more than 3years.
> Now I want to change my database to Postgresql8.3.
> But I occur such problem
> ----------------------------------------------------------
> ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in "SJIS"
> ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in
> "SHIFT_JIS_2004"
> ----------------------------------------------------------
> The database was encoded by UTF-8,
> to export data as .csv file,
> I use  set client_encoding='SJIS' at client.
> When I use Postgresql7.4.3,no problem occur,
> but after I chaged to Postgresql8.3 ,the error was occured.
>
> Can I ignore the error message ?
> or any othe method to solve this problem.

First of all, you should aware that SHIFT_JIS_2004 is a comppletely
different beast from SJIS. If you want to continue to use SJIS data in
7.4, you must use SJIS, not SHIFT_JIS_2004 on 8.3. Or do you have any
particular reason to use SHIFT_JIS_2004?

BTW,

> ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in "SJIS"

I don't see this error message with PostgreSQL 8.3.0 running on a
Linux box. I can store UTF-8 0xe9ab99 (== U+9AD9) and retrieve it from
the SJIS client side (0xe9ab99 corresponds to 0xfbfc). Actually we can
confirm this by looking at line 6914 in
src/backend/utils/mb/Unicode/utf8_to_sjis.map:

  {0xe9ab99, 0xfbfc},

Note that the left is the value for UTF-8, and the right side the
value for SJIS. I recommend you to double check your PostgreSQL 8.3
installation.

For your convenience, I have attatched a dump containing a table
(called "t1") which has the UTF-8 character in question.

$ createdb -E UTF_8 test
$ gunzip -c /tmp/t1.dump.gz|psql test
$ psql -c "set client_encoding to SJIS;select * from t1" test
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Attachment

pgsql-general by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: character conversion problem about UTF-8-->SHIFT_JIS_2004
Next
From: Andrew Sullivan
Date:
Subject: Re: SELECT CAST(123 AS char) -> 1