Thread: UTF-8 encoding problem

UTF-8 encoding problem

From
bhyuan
Date:
hi

I use UTF-8 as server character encoding,
and use sjis as client character encoding.
For some reason, some none sjis encoding character was insert into the database.
WHEN I use
set client_encoding='SJIS
select * from xxx
I got such error message
Native Error: ERROR: character 0xc2a0 of encoding "UTF8" has no equivalent in "SJIS"

I just want to ignore the none-sjis encoding character and go on without any
errors.
I use postgresql8.1, it seems that the postgresql shoud report error at the case
-----------------------------
If the conversion of a particular character is not possible ? suppose you chose EUC_JP for the server and LATIN1 for
theclient, then some Japanese characters do not have a representation in LATIN1 ? then an error is reported.  
-----------------------------

Can I ignore the error message by confiing the config file?

Thanks for any idea.

bhyuan


Re: UTF-8 encoding problem

From
Peter Eisentraut
Date:
Am Donnerstag, 16. August 2007 08:40 schrieb bhyuan:
> Can I ignore the error message by confiing the config file?

No, there are not provisions for that.  Some errors of this type used to be
ignored, but that led to SQL injection-like security issues, so you don't
want that.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: UTF-8 encoding problem

From
bhyuan
Date:
Thanks for your  replay.

Maybe SQL injection-like security issues will occour,
but I find that differend version of Postgresql get different result.

Such as the sql
set client_encoding='SJIS';
select '\xc3\xaa',* from xxx;

on V7.4 @RH3 got
\xc3\xaa

on V8.1.2@RH4 got
    (blank)

on V8.1.4@FreeBSD6 got
ERROR:  character 0xc3aa of encoding "UTF8" has no equivalent in "SJIS"

AND
Version 8.1
http://www.postgresql.org/docs/8.1/interactive/multibyte.html#AEN22591
------------------------------
If the conversion of a particular character is not possible -- suppose you chose EUC_JP for the server and LATIN1 for
theclient, then some Japanese characters do not have a representation in LATIN1 -- then an error is reported.  
------------------------------

Version 7.4
http://www.postgresql.org/docs/7.4/interactive/multibyte.html#AEN18371
------------------------------
If the conversion of a particular character is not possible -- suppose you chose EUC_JP for the server and LATIN1 for
theclient, then some Japanese characters cannot be converted to LATIN1 -- it is transformed to its hexadecimal byte
valuesin parentheses, e.g., (826C).  


I got confused, I just want to get the right sql result enen some character was
not encoded corrctlly.
Just like  V8.1.2@RH4  the not right character was ignored.
....


On Thu, 16 Aug 2007 11:03:39 +0200
Peter Eisentraut <peter_e@gmx.net> wrote:

> Am Donnerstag, 16. August 2007 08:40 schrieb bhyuan:
> > Can I ignore the error message by confiing the config file?
>
> No, there are not provisions for that.  Some errors of this type used to be
> ignored, but that led to SQL injection-like security issues, so you don't
> want that.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
--
bhyuan <bhyuan@gmail.com>


Re: UTF-8 encoding problem

From
Peter Eisentraut
Date:
Am Donnerstag, 16. August 2007 15:21 schrieb bhyuan:
> Maybe SQL injection-like security issues will occour,
> but I find that differend version of Postgresql get different result.

That just shows that some versions are more broken than others.  But there was
a lot of thought put into the current behavior, so it won't be changed back
without sufficient cause.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/