Re: Character Decoding Problems - Mailing list pgsql-jdbc
From | Barry Lind |
---|---|
Subject | Re: Character Decoding Problems |
Date | |
Msg-id | 3F3A82AA.7070906@xythos.com Whole thread Raw |
In response to | Re: Character Decoding Problems (Evan Tsue <evan@windsormgmt.com>) |
List | pgsql-jdbc |
Evan, A call to getBytes() without specifying a character set will use the default encoding for the jvm. I think it is platform dependent on how the jvm determines its default encoding. In my environments the default jvm encoding is LATIN1. thanks, --Barry Evan Tsue wrote: > Ok, I think I've figured out the problem. I retract my statement that > the decodeUTF8 > method is incorrectly implemented. > > I'm still not exactly sure what the problem is. When I do a > getBytes("UTF16") > on the string I get back from the JDBC query, everything looks ok. > However, > when I do getBytes() it seems to default to some other encoding. Does > anyone > know what the deal is with this? > > The issue that still remains is why does the new String(...) method work > for > me whereas the decodeUTF8 method does not? > > Btw, thanks for everybody's help so far. > > Evan > > On Tuesday, Aug 12, 2003, at 23:50 US/Eastern, Evan Tsue wrote: > >> Ok, I've sat down with the problem a little bit more. It now seems >> to me that >> the decodeUTF8 method is doing the encoding correctly. It places the >> result from translating from UTF-8 to UTF-16 in the char[] l_cdata >> variable. >> It then creates a new String by calling >> >> new String(l_cdata, 0, j) >> >> I believe that the variable j is the length of the filled in portion >> of the l_cdata >> array. l_cdata is a class variable that is reused between method calls >> (the decodeUTF8 method is synchronized). >> >> This seems to be the problem. I haven't figured out why yet. I also >> have the >> same problem when running on FreeBSD (using the FreeBSD 1.4 JVM). >> >> Evan >> >> >> On Tuesday, Aug 12, 2003, at 21:28 US/Eastern, zy7111 wrote: >> >>> I use pg73jdbc3.jar as JDBC driver. It works fine. >>> >>>> Yes, it should work in 7.2.2. The decodeUTF8 method wasn't introduced >>>> until later. From the comments in the code, it seems that the reason >>>> for its inclusion was for performance. >>>> >>>> Evan >>>> >>>> On Tuesday, Aug 12, 2003, at 08:34 US/Eastern, <zy7111@mail.china.com> >>>> wrote: >>>> >>>>> I can insert and retrieve chinese into postgresql 7.2.2 successfully. >>>>> Both operation through JDBC. >>>>> It seems you insert text using psql and retrieve using JDBC. >>>>> >>>>> ----- Original Message ----- >>>>> From: "Evan Tsue" <evan@windsormgmt.com> >>>>> To: <pgsql-jdbc@postgresql.org> >>>>> Sent: Tuesday, August 12, 2003 1:38 PM >>>>> Subject: [JDBC] Character Decoding Problems >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> I've been having problems decoding non-Latin characters using the >>>>>> Postgres JDBC driver. Here's the situation: I'm using postgres >>>>>> 7.3.2 >>>>>> and I've created a test database using 'createdb -E UNICODE >>>>>> testdb' to >>>>>> ensure that I really am using the UNICODE character set. Using psql, >>>>>> I >>>>>> created a table using the following command: 'CREATE TABLE messages >>>>>> (message_uid SERIAL PRIMARY KEY, message_text VARCHAR(255))' to test >>>>>> character encoding and decoding. At that point, I inserted a message >>>>>> that was in English. I also inserted a message that was in >>>>>> Arabic. I >>>>>> did a select on that table using psql and the values came back >>>>>> perfectly (I'm using MacOS X, so the characters are displayed >>>>>> correctly). >>>>>> Next, I did a select on the same table via JDBC. All I had the >>>>>> program do was select on the table and print the results out to >>>>>> standard output. The message in English was displayed perfectly. >>>>>> However, the message that was in Arabic was displayed as a series of >>>>>> question marks and spaces. >>>>>> I eventually navigated my way through the JDBC driver source to find >>>>>> that the problem is in the decodeUTF8 method in the >>>>>> org.postgresql.core.Encoding class. Apparently, it doesn't seem >>>>>> to be >>>>>> working properly for non-Western characters. I replaced the call to >>>>>> that method with a call to the java.lang.String constructor and now >>>>>> everything works perfectly. >>>>>> In addition to Arabic, I took a random sample of Chinese, Japanese, >>>>>> Russian and Korean text and inserted it into the database. Using the >>>>>> original driver, I get the question marks. But, when I used the >>>>>> String >>>>>> constructor, everything comes out fine. >>>>>> Could someone please either fix the Encoding.decodeUTF8 method or >>>>>> replace the call to that with a call to the String constructor? >>>>>> >>>>>> Thanks, >>>>>> Evan >>>>>> >>>>>> >>>>>> ---------------------------(end of >>>>>> broadcast)--------------------------- >>>>>> TIP 8: explain analyze is your friend >>>>>> >>>>> >>>>> ---------------------------(end of >>>>> broadcast)--------------------------- >>>>> TIP 8: explain analyze is your friend >>>> >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 2: you can get off all lists at once with the unregister command >>>> (send "unregister YourEmailAddressHere" to >>>> majordomo@postgresql.org) >>> >>> ---------------------------------------------------------------------- >>> ÎÒ´æÔÚ£¬ÒòΪÎÒÊÇÖйúÈË,¾´Çë¹Ø×¢ÖлªÍøÐÅÌìÓÊ! >>> ÐÅÌìÓÊÖ®ÊÕ·ÑÓÊ http://paymail.china.com >>> ÐÅÌìÓÊÖ®Ãâ·ÑÓÊ http://mail.china.com >>> >>> >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >>> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
pgsql-jdbc by date: