Thread: encoding again

encoding again

From
Kathy Zhu
Date:
Hi, sorry that this email is a little bit long, but it is actully not :-))


**** I have a database 'unidb' created with -E UNICODE.

$ psql -l
        List of databases
   Name    |  Owner  | Encoding
-----------+---------+-----------
 unidb     | kathy   | UNICODE


**** I input Chinese data in unicode form. E.g.
logging-threshold=\u65e5\u5fd7\u9608\u503c
polling_setting_error=\u8bbe\u7f6e\u8f6e\u8be2\u95f4\u9694\u65f6\u51fa\u9519

unidb=# show client_encoding;
NOTICE:  Current client encoding is 'UNICODE'
SHOW VARIABLE

unidb=# select * from testbytes;
          name           |          value
-------------------------+-------------------------
 logging_setting_error   | 设置æ¥å¿éå¼æ¶åºé
 polling_setting_error   | 设置轮询é´éæ¶åºé


**** When I retrieve data, I did

unidb=# set client_encoding to 'EUC_CN';
unidb=# show client_encoding;
NOTICE:  Current client encoding is 'EUC_CN'
SHOW VARIABLE

unidb=# select * from testbytes order by value;
          name           |          value
-------------------------+-------------------------
 logging_setting_error   | ־ֵʱ
 polling_setting_error   | ѯʱ


Three problems here:
1) the sorting is based on unicode value, not EUC_CN encoding value.
2) I wrote the ResultSet to a file by using OutputStreamWriter(file, "EUC_CN"). The
file is not readable from the browser with any charset setting.
3) Changing client_encoding from UNICODE to EUC_CN actually alter/loose the data if
you compare the above "select *" statements.

I wonder why this happens ?? According to the doc, automatic encoding coversion
between UNICODE and EIC_CN is supported.

Any help is highly appreciated.
thanks,
kathy








Re: encoding again

From
Peter Eisentraut
Date:
Kathy Zhu writes:

> 1) the sorting is based on unicode value, not EUC_CN encoding value.

The sorting is always based on the server encoding.  There is no way to
change that.

> 2) I wrote the ResultSet to a file by using OutputStreamWriter(file, "EUC_CN"). The
> file is not readable from the browser with any charset setting.

That is a problem in whatever client interface that is (Java?) or your
browser.

> 3) Changing client_encoding from UNICODE to EUC_CN actually alter/loose the data if
> you compare the above "select *" statements.

You're going to have to be a bit more specific, because many of us can't
identify the characters or see what is wrong with them.

Also, try a more recent PostgreSQL version, such as 7.3.4.

--
Peter Eisentraut   peter_e@gmx.net


Re: encoding again

From
Kathy Zhu
Date:
Thanks for your reply !!

I am using 7.3.1.

3) to be more specific about data change/loss after conversion

input data in unicode with Client_encoding set to UNICODE
logging-threshold=\u65e5\u5fd7\u9608\u503c
polling_setting_error=\u8bbe\u7f6e\u8f6e\u8be2\u95f4\u9694\u65f6\u51fa\u9519


retrieved data with Client_encoding change to EUC_CN
logging-threshold=\uFFFD\uFFFD\u05BE\uFFFD\uFFFD\u05B5
polling_setting_error=\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\uFFFD\u046F\uFFFD\uFFFD\uFF
FD\u02B1\uFFFD\uFFFD

thanks,
kathy

> X-Original-To: pgsql-general-postgresql.org@localhost.postgresql.org
> Date: Wed, 10 Sep 2003 00:57:30 +0200 (CEST)
> From: Peter Eisentraut <peter_e@gmx.net>
> X-X-Sender: peter@peter.localdomain
> To: Kathy Zhu <Kathy.Zhu@sun.com>
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] encoding again
> X-Virus-Scanned: by amavisd-new at postgresql.org
> X-Mailing-List: pgsql-general
>
> Kathy Zhu writes:
>
> > 1) the sorting is based on unicode value, not EUC_CN encoding value.
>
> The sorting is always based on the server encoding.  There is no way to
> change that.
>
> > 2) I wrote the ResultSet to a file by using OutputStreamWriter(file,
"EUC_CN"). The
> > file is not readable from the browser with any charset setting.
>
> That is a problem in whatever client interface that is (Java?) or your
> browser.
>
> > 3) Changing client_encoding from UNICODE to EUC_CN actually alter/loose the
data if
> > you compare the above "select *" statements.
>
> You're going to have to be a bit more specific, because many of us can't
> identify the characters or see what is wrong with them.
>
> Also, try a more recent PostgreSQL version, such as 7.3.4.
>
> --
> Peter Eisentraut   peter_e@gmx.net
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org