Thread: Postgres Encoding conversion problem

Postgres Encoding conversion problem

From
Clemens Schwaighofer
Date:
Hi,

I sometimes have a problem with conversion of encodings eg from UTF-8
tio ShiftJIS:

ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no equivalent in "SJIS"

I have no idea what character this is, I cannot view it in my browser, etc.

If I run the conversion through PHP with mb_convert_encoding it works,
perhaps he is ignoring the character.

Is there a way to do a similar thing, like ignoring this character in
postgres too?

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]

Re: Postgres Encoding conversion problem

From
"Albe Laurenz"
Date:
Clemens Schwaighofer wrote:
> I sometimes have a problem with conversion of encodings eg from UTF-8
> tio ShiftJIS:
>
> ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
> equivalent in "SJIS"
>
> I have no idea what character this is, I cannot view it in my
> browser, etc.

It translates to Unicode 10BB7, which is not defined.
I guess that is not intended; can you guess what the character(s) should be?

> If I run the conversion through PHP with mb_convert_encoding it works,
> perhaps he is ignoring the character.
>
> Is there a way to do a similar thing, like ignoring this character in
> postgres too?

As far as I know, no.
You'll have to fix the data before you import them.

Yours,
Laurenz Albe

Re: Postgres Encoding conversion problem

From
Clemens Schwaighofer
Date:
On 04/22/2008 05:37 PM, Albe Laurenz wrote:
> Clemens Schwaighofer wrote:
>> I sometimes have a problem with conversion of encodings eg from UTF-8
>> tio ShiftJIS:
>>
>> ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
>> equivalent in "SJIS"
>>
>> I have no idea what character this is, I cannot view it in my
>> browser, etc.
>
> It translates to Unicode 10BB7, which is not defined.
> I guess that is not intended; can you guess what the character(s) should be?

to be honest no idea. its some chinese character, I have no idea how the
user input this, because this is a japanese page.

I actually found the carachter, but only my Mac OS X can show it. It
looks similar to a japanese character used for a name, but how the
chinese one got selected is a mystery to me ...

>> If I run the conversion through PHP with mb_convert_encoding it works,
>> perhaps he is ignoring the character.
>>
>> Is there a way to do a similar thing, like ignoring this character in
>> postgres too?
>
> As far as I know, no.
> You'll have to fix the data before you import them.

well, the web page & data is in utf8 so I never see this issue, except I
would write a method that detects illegal shift_jis characters, and
thats difficult.

The reporting is only done in CSV ... so I am not sure if it is worth to
waste too much time here.

thanks for the tip.

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]

Re: Postgres Encoding conversion problem

From
"Albe Laurenz"
Date:
Clemens Schwaighofer wrote:
>>> I sometimes have a problem with conversion of encodings eg from UTF-8
>>> tio ShiftJIS:
>>>
>>> ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
>>> equivalent in "SJIS"
>>>
>>> I have no idea what character this is, I cannot view it in my
>>> browser, etc.
>>
>> It translates to Unicode 10BB7, which is not defined.
>> I guess that is not intended; can you guess what the character(s) should be?
>
> to be honest no idea. its some chinese character, I have no idea how the
> user input this, because this is a japanese page.
>
> I actually found the carachter, but only my Mac OS X can show it. It
> looks similar to a japanese character used for a name, but how the
> chinese one got selected is a mystery to me ...

Are you sure that your Mac OS X computer interprets the character as
UTF-8?

Yours,
Laurenz Albe

Re: Postgres Encoding conversion problem

From
Clemens Schwaighofer
Date:
On 04/22/2008 07:30 PM, Albe Laurenz wrote:
> Clemens Schwaighofer wrote:
>>>> I sometimes have a problem with conversion of encodings eg from UTF-8
>>>> tio ShiftJIS:
>>>>
>>>> ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
>>>> equivalent in "SJIS"
>>>>
>>>> I have no idea what character this is, I cannot view it in my
>>>> browser, etc.
>>> It translates to Unicode 10BB7, which is not defined.
>>> I guess that is not intended; can you guess what the character(s) should be?
>> to be honest no idea. its some chinese character, I have no idea how the
>> user input this, because this is a japanese page.
>>
>> I actually found the carachter, but only my Mac OS X can show it. It
>> looks similar to a japanese character used for a name, but how the
>> chinese one got selected is a mystery to me ...
>
> Are you sure that your Mac OS X computer interprets the character as
> UTF-8?

That I cannot be sure, I just searched through a page that has a
complete list. OS X can render it, Linux cannot, I have not tried windows.

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]

Re: Postgres Encoding conversion problem

From
Michael Fuhr
Date:
On Tue, Apr 22, 2008 at 10:37:59AM +0200, Albe Laurenz wrote:
> Clemens Schwaighofer wrote:
> > I sometimes have a problem with conversion of encodings eg from UTF-8
> > tio ShiftJIS:
> >
> > ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
> > equivalent in "SJIS"
> >
> > I have no idea what character this is, I cannot view it in my
> > browser, etc.
>
> It translates to Unicode 10BB7, which is not defined.

Actually it's <U+20BB7 CJK UNIFIED IDEOGRAPH-20BB7>.

http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=20BB7

--
Michael Fuhr

Re: Postgres Encoding conversion problem

From
"Albe Laurenz"
Date:
Michael Fuhr wrote:
>>> I sometimes have a problem with conversion of encodings eg from UTF-8
>>> tio ShiftJIS:
>>>
>>> ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
>>> equivalent in "SJIS"
>>>
>>> I have no idea what character this is, I cannot view it in my
>>> browser, etc.
>>
>> It translates to Unicode 10BB7, which is not defined.
>
> Actually it's <U+20BB7 CJK UNIFIED IDEOGRAPH-20BB7>.

Oops, you're correct. Made an error in my calculations. Thanks.

So that explains the problem.
Still, to handle it, the offending character needs to be changed before
converting to SJIS.

Yours,
Laurenz Albe

Re: Postgres Encoding conversion problem

From
Clemens Schwaighofer
Date:
On 04/23/2008 04:33 PM, Albe Laurenz wrote:
> Michael Fuhr wrote:
>>>> I sometimes have a problem with conversion of encodings eg from UTF-8
>>>> tio ShiftJIS:
>>>>
>>>> ERROR:  character 0xf0a0aeb7 of encoding "UTF8" has no
>>>> equivalent in "SJIS"
>>>>
>>>> I have no idea what character this is, I cannot view it in my
>>>> browser, etc.
>>> It translates to Unicode 10BB7, which is not defined.
>> Actually it's <U+20BB7 CJK UNIFIED IDEOGRAPH-20BB7>.
>
> Oops, you're correct. Made an error in my calculations. Thanks.
>
> So that explains the problem.
> Still, to handle it, the offending character needs to be changed before
> converting to SJIS.

probably wont get around a clean up before writing script. *sigh* Or
export the data in UTF-8 ...

--
[ Clemens Schwaighofer                      -----=====:::::~ ]
[ IT Engineer/Manager, TEQUILA\ Japan IT Group               ]
[                6-17-2 Ginza Chuo-ku, Tokyo 104-8167, JAPAN ]
[ Tel: +81-(0)3-3545-7703            Fax: +81-(0)3-3545-7343 ]
[ http://www.tequila.co.jp                                   ]