Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility - Mailing list pgsql-jdbc
From | Achilleus Mantzios |
---|---|
Subject | Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility |
Date | |
Msg-id | Pine.LNX.4.44.0302051116200.6193-100000@matrix.gatewaynet.com Whole thread Raw |
In response to | Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility (Achilleus Mantzios <achill@matrix.gatewaynet.com>) |
List | pgsql-jdbc |
On Wed, 5 Feb 2003, Achilleus Mantzios wrote: > On Tue, 4 Feb 2003, Barry Lind wrote: > > > Achilleus, > > > > What is the character set of your database? My guess is that it is > > SQLASCII which is a 7bit character set. If you are storing ISO-8859-7 > > data you should have that as your database character set. All reports > > Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine). > If you read the code, you will see that the driver for all 7.3 versions > forces UTF-8 client encoding. > > From AbstractJdbc1Connection.java i read: > > //We also set the client encoding so that the driver only needs > //to deal with utf8. We can only do this in 7.3 because multibyte > //support is now always included > > So what happens is that the database converts from > sqlascii -> utf-8 (client encoding), > and then the driver from utf-8 -> Unicode (with line 164 in > Encoding.java). > > So, if you store in the database the chars 0xA0 0x0A > you have a test case! > (the Encoding.decodeUTF8 method throws the indicated Exception). > > Dont be mislead by me saying that i had 8bit chars (greek) > in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely > enter greek data eitherway). > > Now the real problems are > a) Greek chars, mainly my fault but backwards compatibility problem. > In 7.2.3 the server returned SQL_ASCII chars, interpreted these > as greek UTF8 chars and returned valid greek java unicode strings > and everybody was happy. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Excuse me, i was wrong. What happened is that i inserted in java, 8bit ASCII chars (not greek UTF8), and data were stored as SQLASCII, then in my jsp, i just read those ASCII chars, and because my servlet container encoding was ISO-8859-1 no conversion was done, and then because my page's charset was set to ISO-8859-7, the browser displayed greek chars correctly. > > Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence > the problem > > b) NOT GREEK RELATED! > With database_encoding set to SQL_ASCII, the server converts these wierd > 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails. > > I think you should deal with problem b). > To create a test case is easy. > Create a SQL_ASCII database, then insert these 2 chars in a text column > (having typed these two chars with some utility like khexedit), > and then out.println this string. > > > > of problems I have seen in this regards were because the database > > character set didn't match the character set of the actual data. This > > is important because the jdbc driver needs to convert the data to java > > unicode, and if the database character set is incorrectly defined it > > cannot do this correctly. > > > > If this isn't your problem, please submit a test case that shows your > > problem so that we can look into it. > > > > thanks, > > --Barry > > > > > > Achilleus Mantzios wrote: > > > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver. > > > > > > 1) The new 7.3.1 assumes data is stored in UNICODE in the database > > > (which is most likely reloaded from a 7.2.x dump) > > > For instance, in my case all text data in my 7.2.3 were > > > ISO-8859-7 (Greek) (8bit ASCII compatible). > > > I was not able to read these data correctly since the driver > > > assumed i stored them in utf-8. > > > > > > 2) When the contents of a varchar or text field are the > > > ASCII 0xA0 0x0A (which for some reason IE strangely produces) > > > the driver throws an java.lang.ArrayIndexOutOfBoundsException : > > > > > > 2003-01-27 11:50:55,665 ERROR [STDERR] > > > java.lang.ArrayIndexOutOfBoundsException > > > 2003-01-27 11:50:55,666 ERROR [STDERR] at > > > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259) > > > 2003-01-27 11:50:55,667 ERROR [STDERR] at > > > org.postgresql.core.Encoding.decode(Encoding.java:165) > > > 2003-01-27 11:50:55,667 ERROR [STDERR] at > > > org.postgresql.core.Encoding.decode(Encoding.java:181) > > > 2003-01-27 11:50:55,668 ERROR [STDERR] at > > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97) > > > > > > In order to solve these 2 problems for my case , i.e. with no need > > > for unicode support i wrote this simple patch. > > > (Note this patch is usefull only for people who DONT NEED > > > multibyte support) > > > --------------------------cut here------------------------------ > > > *** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003 > > > --- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003 > > > *************** > > > *** 372,382 **** > > > //support is now always included > > > if (haveMinimumServerVersion("7.3")) > > > { > > > java.sql.ResultSet acRset = > > > ! ExecSQL("set client_encoding = 'UNICODE'; show autocommit"); > > > > > > //set encoding to be unicode > > > ! encoding = Encoding.getEncoding("UNICODE", null); > > > > > > if (!acRset.next()) > > > { > > > --- 372,384 ---- > > > //support is now always included > > > if (haveMinimumServerVersion("7.3")) > > > { > > > + // java.sql.ResultSet acRset = > > > + // ExecSQL("set client_encoding = 'UNICODE'; show autocommit"); > > > java.sql.ResultSet acRset = > > > ! ExecSQL("show autocommit"); > > > > > > //set encoding to be unicode > > > ! // encoding = Encoding.getEncoding("UNICODE", null); > > > > > > if (!acRset.next()) > > > { > > > -------------------cut here------------------------------------------- > > > ================================================================== > > > Achilleus Mantzios > > > S/W Engineer > > > IT dept > > > Dynacom Tankers Mngmt > > > Nikis 4, Glyfada > > > Athens 16610 > > > Greece > > > tel: +30-10-8981112 > > > fax: +30-10-8981877 > > > email: achill@matrix.gatewaynet.com > > > mantzios@softlab.ece.ntua.gr > > > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 4: Don't 'kill -9' the postmaster > > > > > > > > > > > ================================================================== > Achilleus Mantzios > S/W Engineer > IT dept > Dynacom Tankers Mngmt > Nikis 4, Glyfada > Athens 16610 > Greece > tel: +30-10-8981112 > fax: +30-10-8981877 > email: achill@matrix.gatewaynet.com > mantzios@softlab.ece.ntua.gr > > ================================================================== Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt Nikis 4, Glyfada Athens 16610 Greece tel: +30-10-8981112 fax: +30-10-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
pgsql-jdbc by date: