Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility - Mailing list pgsql-jdbc

From Achilleus Mantzios
Subject Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date
Msg-id Pine.LNX.4.44.0302051027200.1908-100000@matrix.gatewaynet.com
Whole thread Raw
In response to Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility  (Barry Lind <blind@xythos.com>)
Responses Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility  (Achilleus Mantzios <achill@matrix.gatewaynet.com>)
Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility  (Barry Lind <blind@xythos.com>)
List pgsql-jdbc
On Tue, 4 Feb 2003, Barry Lind wrote:

> Achilleus,
>
> What is the character set of your database?  My guess is that it is
> SQLASCII which is a 7bit character set.  If you are storing ISO-8859-7
> data you should have that as your database character set.  All reports

Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
If you read the code, you will see that the driver for all 7.3 versions
forces UTF-8 client encoding.

From AbstractJdbc1Connection.java i read:

//We also set the client encoding so that the driver only needs
//to deal with utf8.  We can only do this in 7.3 because multibyte
//support is now always included

So what happens is that the database converts from
sqlascii -> utf-8 (client encoding),
and then the driver from utf-8 -> Unicode (with line 164 in
Encoding.java).

So, if you store in the database the chars 0xA0 0x0A
you have a test case!
(the Encoding.decodeUTF8 method throws the indicated Exception).

Dont be mislead by me saying that i had 8bit chars (greek)
in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
enter greek data eitherway).

Now the real problems are
a) Greek chars, mainly my fault but backwards compatibility problem.
 In 7.2.3 the server returned SQL_ASCII chars, interpreted these
 as greek UTF8 chars and returned valid greek java unicode strings
 and everybody was happy.

 Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
 the problem

b) NOT GREEK RELATED!
 With database_encoding set to SQL_ASCII, the server converts these wierd
 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.

I think you should deal with problem b).
To create a test case is easy.
Create a SQL_ASCII database, then insert these 2 chars in a text column
(having typed these two chars with some utility like khexedit),
and then out.println this string.


> of problems I have seen in this regards were because the database
> character set didn't match the character set of the actual data.  This
> is important because the jdbc driver needs to convert the data to java
> unicode, and if the database character set is incorrectly defined it
> cannot do this correctly.
>
> If this isn't your problem, please submit a test case that shows your
> problem so that we can look into it.
>
> thanks,
> --Barry
>
>
> Achilleus Mantzios wrote:
> > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> >
> > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > (which is most likely reloaded from a 7.2.x dump)
> > For instance, in my case all text data in my 7.2.3 were
> > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > I was not able to read these data correctly since the driver
> > assumed i stored them in utf-8.
> >
> > 2) When the contents of a varchar or text field are the
> > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> >
> > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > java.lang.ArrayIndexOutOfBoundsException
> > 2003-01-27 11:50:55,666 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > 2003-01-27 11:50:55,668 ERROR [STDERR]  at
> > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> >
> > In order to solve these 2 problems for my case , i.e. with no need
> > for unicode support i wrote this simple patch.
> > (Note this patch is usefull only for people who DONT NEED
> > multibyte support)
> > --------------------------cut here------------------------------
> > *** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
> > --- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
> > ***************
> > *** 372,382 ****
> >           //support is now always included
> >           if (haveMinimumServerVersion("7.3"))
> >           {
> >               java.sql.ResultSet acRset =
> > !                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> >
> >               //set encoding to be unicode
> > !             encoding = Encoding.getEncoding("UNICODE", null);
> >
> >               if (!acRset.next())
> >               {
> > --- 372,384 ----
> >           //support is now always included
> >           if (haveMinimumServerVersion("7.3"))
> >           {
> > + //            java.sql.ResultSet acRset =
> > + //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> >               java.sql.ResultSet acRset =
> > !                 ExecSQL("show autocommit");
> >
> >               //set encoding to be unicode
> > ! //            encoding = Encoding.getEncoding("UNICODE", null);
> >
> >               if (!acRset.next())
> >               {
> > -------------------cut here-------------------------------------------
> > ==================================================================
> > Achilleus Mantzios
> > S/W Engineer
> > IT dept
> > Dynacom Tankers Mngmt
> > Nikis 4, Glyfada
> > Athens 16610
> > Greece
> > tel:    +30-10-8981112
> > fax:    +30-10-8981877
> > email:  achill@matrix.gatewaynet.com
> >         mantzios@softlab.ece.ntua.gr
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


pgsql-jdbc by date:

Previous
From: Kris Jurka
Date:
Subject: Re: cannot build current cvs
Next
From: Achilleus Mantzios
Date:
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility