Re: Character Decoding Problems - Mailing list pgsql-jdbc

From Evan Tsue
Subject Re: Character Decoding Problems
Date
Msg-id 40249E56-CD41-11D7-A787-000A95A08104@windsormgmt.com
Whole thread Raw
In response to Re: Character Decoding Problems  ("zy7111" <zy7111@mail.china.com>)
Responses Re: Character Decoding Problems
List pgsql-jdbc
Ok,  I've sat down with the problem a little bit more.  It now seems to
me that
the decodeUTF8 method is doing the encoding correctly.  It places the
result from translating from UTF-8 to UTF-16 in the char[] l_cdata
variable.
It then creates a new String by calling

    new String(l_cdata, 0, j)

I believe that the variable j is the length of the filled in portion of
the l_cdata
array.  l_cdata is a class variable that is reused between method calls
(the decodeUTF8 method is synchronized).

This seems to be the problem.  I haven't figured out why yet.  I also
have the
same problem when running on FreeBSD (using the FreeBSD 1.4 JVM).

Evan


On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote:

> I use pg73jdbc3.jar as JDBC driver. It works fine.
>
>> Yes, it should work in 7.2.2.  The decodeUTF8 method wasn't introduced
>> until later.  From the comments in the code, it seems that the reason
>> for its inclusion was for performance.
>>
>> Evan
>>
>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern, <zy7111@mail.china.com>
>> wrote:
>>
>>> I can insert and retrieve chinese into postgresql 7.2.2 successfully.
>>> Both operation through JDBC.
>>> It seems you insert text using psql and retrieve using JDBC.
>>>
>>> ----- Original Message -----
>>> From: "Evan Tsue" <evan@windsormgmt.com>
>>> To: <pgsql-jdbc@postgresql.org>
>>> Sent: Tuesday, August 12, 2003 1:38 PM
>>> Subject: [JDBC] Character Decoding Problems
>>>
>>>
>>>> Hi,
>>>>
>>>> I've been having problems decoding non-Latin characters using the
>>>> Postgres JDBC driver.  Here's the situation:  I'm using postgres
>>>> 7.3.2
>>>> and I've created a test database using 'createdb -E UNICODE testdb'
>>>> to
>>>> ensure that I really am using the UNICODE character set.  Using
>>>> psql,
>>>> I
>>>> created a table using the following command: 'CREATE TABLE messages
>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to test
>>>> character encoding and decoding.  At that point, I inserted a
>>>> message
>>>> that was in English.  I also inserted a message that was in Arabic.
>>>>  I
>>>> did a select on that table using psql and the values came back
>>>> perfectly (I'm using MacOS X, so the characters are displayed
>>>> correctly).
>>>> Next, I did a select on the same table via JDBC.  All I had the
>>>> program do was select on the table and print the results out to
>>>> standard output.  The message in English was displayed perfectly.
>>>> However, the message that was in Arabic was displayed as a series of
>>>> question marks and spaces.
>>>> I eventually navigated my way through the JDBC driver source to find
>>>> that the problem is in the decodeUTF8 method in the
>>>> org.postgresql.core.Encoding class.  Apparently, it doesn't seem to
>>>> be
>>>> working properly for non-Western characters.  I replaced the call to
>>>> that method with a call to the java.lang.String constructor and now
>>>> everything works perfectly.
>>>> In addition to Arabic, I took a random sample of Chinese, Japanese,
>>>> Russian and Korean text and inserted it into the database.  Using
>>>> the
>>>> original driver, I get the question marks.  But, when I used the
>>>> String
>>>> constructor, everything comes out fine.
>>>> Could someone please either fix the Encoding.decodeUTF8 method or
>>>> replace the call to that with a call to the String constructor?
>>>>
>>>> Thanks,
>>>> Evan
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 8: explain analyze is your friend
>>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 8: explain analyze is your friend
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to
>> majordomo@postgresql.org)
> ----------------------------------------------------------------------
> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ!
> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com
> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>


pgsql-jdbc by date:

Previous
From: "zy7111"
Date:
Subject: Re: Character Decoding Problems
Next
From: Evan Tsue
Date:
Subject: Re: Character Decoding Problems