Re: Character Decoding Problems - Mailing list pgsql-jdbc

From Barry Lind
Subject Re: Character Decoding Problems
Date
Msg-id 3F3A82AA.7070906@xythos.com
Whole thread Raw
In response to Re: Character Decoding Problems  (Evan Tsue <evan@windsormgmt.com>)
List pgsql-jdbc
Evan,

A call to getBytes() without specifying a character set will use the
default encoding for the jvm.  I think it is platform dependent on how
the jvm determines its default encoding.  In my environments the default
jvm encoding is LATIN1.

thanks,
--Barry


Evan Tsue wrote:
> Ok, I think I've figured out the problem.  I retract my statement that
> the decodeUTF8
> method is incorrectly implemented.
>
> I'm still not exactly sure what the problem is.  When I do a
> getBytes("UTF16")
> on the string I get back from the JDBC query, everything looks ok.
> However,
> when I do getBytes() it seems to default to some other encoding.  Does
> anyone
> know what the deal is with this?
>
> The issue that still remains is why does the new String(...) method work
> for
> me whereas the decodeUTF8 method does not?
>
> Btw, thanks for everybody's help so far.
>
> Evan
>
> On Tuesday, Aug 12, 2003, at 23:50 US/Eastern, Evan Tsue wrote:
>
>> Ok,  I've sat down with the problem a little bit more.  It now seems
>> to me that
>> the decodeUTF8 method is doing the encoding correctly.  It places the
>> result from translating from UTF-8 to UTF-16 in the char[] l_cdata
>> variable.
>> It then creates a new String by calling
>>
>>     new String(l_cdata, 0, j)
>>
>> I believe that the variable j is the length of the filled in portion
>> of the l_cdata
>> array.  l_cdata is a class variable that is reused between method calls
>> (the decodeUTF8 method is synchronized).
>>
>> This seems to be the problem.  I haven't figured out why yet.  I also
>> have the
>> same problem when running on FreeBSD (using the FreeBSD 1.4 JVM).
>>
>> Evan
>>
>>
>> On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote:
>>
>>> I use pg73jdbc3.jar as JDBC driver. It works fine.
>>>
>>>> Yes, it should work in 7.2.2.  The decodeUTF8 method wasn't introduced
>>>> until later.  From the comments in the code, it seems that the reason
>>>> for its inclusion was for performance.
>>>>
>>>> Evan
>>>>
>>>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern, <zy7111@mail.china.com>
>>>> wrote:
>>>>
>>>>> I can insert and retrieve chinese into postgresql 7.2.2 successfully.
>>>>> Both operation through JDBC.
>>>>> It seems you insert text using psql and retrieve using JDBC.
>>>>>
>>>>> ----- Original Message -----
>>>>> From: "Evan Tsue" <evan@windsormgmt.com>
>>>>> To: <pgsql-jdbc@postgresql.org>
>>>>> Sent: Tuesday, August 12, 2003 1:38 PM
>>>>> Subject: [JDBC] Character Decoding Problems
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been having problems decoding non-Latin characters using the
>>>>>> Postgres JDBC driver.  Here's the situation:  I'm using postgres
>>>>>> 7.3.2
>>>>>> and I've created a test database using 'createdb -E UNICODE
>>>>>> testdb' to
>>>>>> ensure that I really am using the UNICODE character set.  Using psql,
>>>>>> I
>>>>>> created a table using the following command: 'CREATE TABLE messages
>>>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to test
>>>>>> character encoding and decoding.  At that point, I inserted a message
>>>>>> that was in English.  I also inserted a message that was in
>>>>>> Arabic.  I
>>>>>> did a select on that table using psql and the values came back
>>>>>> perfectly (I'm using MacOS X, so the characters are displayed
>>>>>> correctly).
>>>>>> Next, I did a select on the same table via JDBC.  All I had the
>>>>>> program do was select on the table and print the results out to
>>>>>> standard output.  The message in English was displayed perfectly.
>>>>>> However, the message that was in Arabic was displayed as a series of
>>>>>> question marks and spaces.
>>>>>> I eventually navigated my way through the JDBC driver source to find
>>>>>> that the problem is in the decodeUTF8 method in the
>>>>>> org.postgresql.core.Encoding class.  Apparently, it doesn't seem
>>>>>> to be
>>>>>> working properly for non-Western characters.  I replaced the call to
>>>>>> that method with a call to the java.lang.String constructor and now
>>>>>> everything works perfectly.
>>>>>> In addition to Arabic, I took a random sample of Chinese, Japanese,
>>>>>> Russian and Korean text and inserted it into the database.  Using the
>>>>>> original driver, I get the question marks.  But, when I used the
>>>>>> String
>>>>>> constructor, everything comes out fine.
>>>>>> Could someone please either fix the Encoding.decodeUTF8 method or
>>>>>> replace the call to that with a call to the String constructor?
>>>>>>
>>>>>> Thanks,
>>>>>> Evan
>>>>>>
>>>>>>
>>>>>> ---------------------------(end of
>>>>>> broadcast)---------------------------
>>>>>> TIP 8: explain analyze is your friend
>>>>>>
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 8: explain analyze is your friend
>>>>
>>>>
>>>>
>>>> ---------------------------(end of
>>>> broadcast)---------------------------
>>>> TIP 2: you can get off all lists at once with the unregister command
>>>>     (send "unregister YourEmailAddressHere" to
>>>> majordomo@postgresql.org)
>>>
>>> ----------------------------------------------------------------------
>>> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ!
>>> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com
>>> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com
>>>
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>
>



pgsql-jdbc by date:

Previous
From: Evan Tsue
Date:
Subject: Re: Character Decoding Problems
Next
From: Arturo Pérez
Date:
Subject: Fwd: [HACKERS] 7.4 LOG: invalid message length