Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility - Mailing list pgsql-jdbc

From Achilleus Mantzios
Subject Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date
Msg-id Pine.LNX.4.44.0302051116200.6193-100000@matrix.gatewaynet.com
Whole thread Raw
In response to Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility  (Achilleus Mantzios <achill@matrix.gatewaynet.com>)
List pgsql-jdbc
On Wed, 5 Feb 2003, Achilleus Mantzios wrote:

> On Tue, 4 Feb 2003, Barry Lind wrote:
>
> > Achilleus,
> >
> > What is the character set of your database?  My guess is that it is
> > SQLASCII which is a 7bit character set.  If you are storing ISO-8859-7
> > data you should have that as your database character set.  All reports
>
> Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
> If you read the code, you will see that the driver for all 7.3 versions
> forces UTF-8 client encoding.
>
> From AbstractJdbc1Connection.java i read:
>
> //We also set the client encoding so that the driver only needs
> //to deal with utf8.  We can only do this in 7.3 because multibyte
> //support is now always included
>
> So what happens is that the database converts from
> sqlascii -> utf-8 (client encoding),
> and then the driver from utf-8 -> Unicode (with line 164 in
> Encoding.java).
>
> So, if you store in the database the chars 0xA0 0x0A
> you have a test case!
> (the Encoding.decodeUTF8 method throws the indicated Exception).
>
> Dont be mislead by me saying that i had 8bit chars (greek)
> in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
> enter greek data eitherway).
>
> Now the real problems are
> a) Greek chars, mainly my fault but backwards compatibility problem.
>  In 7.2.3 the server returned SQL_ASCII chars, interpreted these
>  as greek UTF8 chars and returned valid greek java unicode strings
>  and everybody was happy.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Excuse me, i was wrong.
What happened is that i inserted in java, 8bit ASCII chars
(not greek UTF8), and data were stored as SQLASCII,
then in my jsp, i just read those ASCII chars, and because my
servlet container encoding was ISO-8859-1 no conversion was done,
and then because my page's charset was set to ISO-8859-7,
the browser displayed greek chars correctly.

>
>  Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
>  the problem
>
> b) NOT GREEK RELATED!
>  With database_encoding set to SQL_ASCII, the server converts these wierd
>  2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
>
> I think you should deal with problem b).
> To create a test case is easy.
> Create a SQL_ASCII database, then insert these 2 chars in a text column
> (having typed these two chars with some utility like khexedit),
> and then out.println this string.
>
>
> > of problems I have seen in this regards were because the database
> > character set didn't match the character set of the actual data.  This
> > is important because the jdbc driver needs to convert the data to java
> > unicode, and if the database character set is incorrectly defined it
> > cannot do this correctly.
> >
> > If this isn't your problem, please submit a test case that shows your
> > problem so that we can look into it.
> >
> > thanks,
> > --Barry
> >
> >
> > Achilleus Mantzios wrote:
> > > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> > >
> > > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > > (which is most likely reloaded from a 7.2.x dump)
> > > For instance, in my case all text data in my 7.2.3 were
> > > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > > I was not able to read these data correctly since the driver
> > > assumed i stored them in utf-8.
> > >
> > > 2) When the contents of a varchar or text field are the
> > > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> > >
> > > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > > java.lang.ArrayIndexOutOfBoundsException
> > > 2003-01-27 11:50:55,666 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > > 2003-01-27 11:50:55,668 ERROR [STDERR]  at
> > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> > >
> > > In order to solve these 2 problems for my case , i.e. with no need
> > > for unicode support i wrote this simple patch.
> > > (Note this patch is usefull only for people who DONT NEED
> > > multibyte support)
> > > --------------------------cut here------------------------------
> > > *** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
> > > --- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
> > > ***************
> > > *** 372,382 ****
> > >           //support is now always included
> > >           if (haveMinimumServerVersion("7.3"))
> > >           {
> > >               java.sql.ResultSet acRset =
> > > !                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > >
> > >               //set encoding to be unicode
> > > !             encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > >               if (!acRset.next())
> > >               {
> > > --- 372,384 ----
> > >           //support is now always included
> > >           if (haveMinimumServerVersion("7.3"))
> > >           {
> > > + //            java.sql.ResultSet acRset =
> > > + //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > >               java.sql.ResultSet acRset =
> > > !                 ExecSQL("show autocommit");
> > >
> > >               //set encoding to be unicode
> > > ! //            encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > >               if (!acRset.next())
> > >               {
> > > -------------------cut here-------------------------------------------
> > > ==================================================================
> > > Achilleus Mantzios
> > > S/W Engineer
> > > IT dept
> > > Dynacom Tankers Mngmt
> > > Nikis 4, Glyfada
> > > Athens 16610
> > > Greece
> > > tel:    +30-10-8981112
> > > fax:    +30-10-8981877
> > > email:  achill@matrix.gatewaynet.com
> > >         mantzios@softlab.ece.ntua.gr
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Don't 'kill -9' the postmaster
> > >
> >
> >
> >
>
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-10-8981112
> fax:    +30-10-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


pgsql-jdbc by date:

Previous
From: Achilleus Mantzios
Date:
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Next
From: Dave Cramer
Date:
Subject: Re: cannot build current cvs