Thread: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Achilleus Mantzios
Date:
Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.

1) The new 7.3.1 assumes data is stored in UNICODE in the database
(which is most likely reloaded from a 7.2.x dump)
For instance, in my case all text data in my 7.2.3 were
ISO-8859-7 (Greek) (8bit ASCII compatible).
I was not able to read these data correctly since the driver
assumed i stored them in utf-8.

2) When the contents of a varchar or text field are the
ASCII 0xA0 0x0A (which for some reason IE strangely produces)
the driver throws an java.lang.ArrayIndexOutOfBoundsException :

2003-01-27 11:50:55,665 ERROR [STDERR]
java.lang.ArrayIndexOutOfBoundsException
2003-01-27 11:50:55,666 ERROR [STDERR]  at
org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
2003-01-27 11:50:55,667 ERROR [STDERR]  at
org.postgresql.core.Encoding.decode(Encoding.java:165)
2003-01-27 11:50:55,667 ERROR [STDERR]  at
org.postgresql.core.Encoding.decode(Encoding.java:181)
2003-01-27 11:50:55,668 ERROR [STDERR]  at
org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)

In order to solve these 2 problems for my case , i.e. with no need
for unicode support i wrote this simple patch.
(Note this patch is usefull only for people who DONT NEED
multibyte support)
--------------------------cut here------------------------------
*** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
--- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
***************
*** 372,382 ****
          //support is now always included
          if (haveMinimumServerVersion("7.3"))
          {
              java.sql.ResultSet acRset =
!                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");

              //set encoding to be unicode
!             encoding = Encoding.getEncoding("UNICODE", null);

              if (!acRset.next())
              {
--- 372,384 ----
          //support is now always included
          if (haveMinimumServerVersion("7.3"))
          {
+ //            java.sql.ResultSet acRset =
+ //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
              java.sql.ResultSet acRset =
!                 ExecSQL("show autocommit");

              //set encoding to be unicode
! //            encoding = Encoding.getEncoding("UNICODE", null);

              if (!acRset.next())
              {
-------------------cut here-------------------------------------------
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr



Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Achilleus Mantzios
Date:
On Tue, 28 Jan 2003, Achilleus Mantzios wrote:

I found another guy having the same problem (as problem #2)
using postgresql 7.3.1 with resin.
http://www.caucho.com/support/ejb-interest/0212/0049.html

He suggested that the garbled input was do to the "smart quotes"
some MS products are inserting.

>
> Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
>
> 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> (which is most likely reloaded from a 7.2.x dump)
> For instance, in my case all text data in my 7.2.3 were
> ISO-8859-7 (Greek) (8bit ASCII compatible).
> I was not able to read these data correctly since the driver
> assumed i stored them in utf-8.
>
> 2) When the contents of a varchar or text field are the
> ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> the driver throws an java.lang.ArrayIndexOutOfBoundsException :
>
> 2003-01-27 11:50:55,665 ERROR [STDERR]
> java.lang.ArrayIndexOutOfBoundsException
> 2003-01-27 11:50:55,666 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decode(Encoding.java:165)
> 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decode(Encoding.java:181)
> 2003-01-27 11:50:55,668 ERROR [STDERR]  at
> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>
> In order to solve these 2 problems for my case , i.e. with no need
> for unicode support i wrote this simple patch.
> (Note this patch is usefull only for people who DONT NEED
> multibyte support)
> --------------------------cut here------------------------------
> *** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
> --- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
> ***************
> *** 372,382 ****
>           //support is now always included
>           if (haveMinimumServerVersion("7.3"))
>           {
>               java.sql.ResultSet acRset =
> !                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
>
>               //set encoding to be unicode
> !             encoding = Encoding.getEncoding("UNICODE", null);
>
>               if (!acRset.next())
>               {
> --- 372,384 ----
>           //support is now always included
>           if (haveMinimumServerVersion("7.3"))
>           {
> + //            java.sql.ResultSet acRset =
> + //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
>               java.sql.ResultSet acRset =
> !                 ExecSQL("show autocommit");
>
>               //set encoding to be unicode
> ! //            encoding = Encoding.getEncoding("UNICODE", null);
>
>               if (!acRset.next())
>               {
> -------------------cut here-------------------------------------------
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-10-8981112
> fax:    +30-10-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Barry Lind
Date:
Achilleus,

What is the character set of your database?  My guess is that it is
SQLASCII which is a 7bit character set.  If you are storing ISO-8859-7
data you should have that as your database character set.  All reports
of problems I have seen in this regards were because the database
character set didn't match the character set of the actual data.  This
is important because the jdbc driver needs to convert the data to java
unicode, and if the database character set is incorrectly defined it
cannot do this correctly.

If this isn't your problem, please submit a test case that shows your
problem so that we can look into it.

thanks,
--Barry


Achilleus Mantzios wrote:
> Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
>
> 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> (which is most likely reloaded from a 7.2.x dump)
> For instance, in my case all text data in my 7.2.3 were
> ISO-8859-7 (Greek) (8bit ASCII compatible).
> I was not able to read these data correctly since the driver
> assumed i stored them in utf-8.
>
> 2) When the contents of a varchar or text field are the
> ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> the driver throws an java.lang.ArrayIndexOutOfBoundsException :
>
> 2003-01-27 11:50:55,665 ERROR [STDERR]
> java.lang.ArrayIndexOutOfBoundsException
> 2003-01-27 11:50:55,666 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decode(Encoding.java:165)
> 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decode(Encoding.java:181)
> 2003-01-27 11:50:55,668 ERROR [STDERR]  at
> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>
> In order to solve these 2 problems for my case , i.e. with no need
> for unicode support i wrote this simple patch.
> (Note this patch is usefull only for people who DONT NEED
> multibyte support)
> --------------------------cut here------------------------------
> *** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
> --- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
> ***************
> *** 372,382 ****
>           //support is now always included
>           if (haveMinimumServerVersion("7.3"))
>           {
>               java.sql.ResultSet acRset =
> !                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
>
>               //set encoding to be unicode
> !             encoding = Encoding.getEncoding("UNICODE", null);
>
>               if (!acRset.next())
>               {
> --- 372,384 ----
>           //support is now always included
>           if (haveMinimumServerVersion("7.3"))
>           {
> + //            java.sql.ResultSet acRset =
> + //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
>               java.sql.ResultSet acRset =
> !                 ExecSQL("show autocommit");
>
>               //set encoding to be unicode
> ! //            encoding = Encoding.getEncoding("UNICODE", null);
>
>               if (!acRset.next())
>               {
> -------------------cut here-------------------------------------------
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-10-8981112
> fax:    +30-10-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>




Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Achilleus Mantzios
Date:
On Tue, 4 Feb 2003, Barry Lind wrote:

> Achilleus,
>
> What is the character set of your database?  My guess is that it is
> SQLASCII which is a 7bit character set.  If you are storing ISO-8859-7
> data you should have that as your database character set.  All reports

Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
If you read the code, you will see that the driver for all 7.3 versions
forces UTF-8 client encoding.

From AbstractJdbc1Connection.java i read:

//We also set the client encoding so that the driver only needs
//to deal with utf8.  We can only do this in 7.3 because multibyte
//support is now always included

So what happens is that the database converts from
sqlascii -> utf-8 (client encoding),
and then the driver from utf-8 -> Unicode (with line 164 in
Encoding.java).

So, if you store in the database the chars 0xA0 0x0A
you have a test case!
(the Encoding.decodeUTF8 method throws the indicated Exception).

Dont be mislead by me saying that i had 8bit chars (greek)
in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
enter greek data eitherway).

Now the real problems are
a) Greek chars, mainly my fault but backwards compatibility problem.
 In 7.2.3 the server returned SQL_ASCII chars, interpreted these
 as greek UTF8 chars and returned valid greek java unicode strings
 and everybody was happy.

 Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
 the problem

b) NOT GREEK RELATED!
 With database_encoding set to SQL_ASCII, the server converts these wierd
 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.

I think you should deal with problem b).
To create a test case is easy.
Create a SQL_ASCII database, then insert these 2 chars in a text column
(having typed these two chars with some utility like khexedit),
and then out.println this string.


> of problems I have seen in this regards were because the database
> character set didn't match the character set of the actual data.  This
> is important because the jdbc driver needs to convert the data to java
> unicode, and if the database character set is incorrectly defined it
> cannot do this correctly.
>
> If this isn't your problem, please submit a test case that shows your
> problem so that we can look into it.
>
> thanks,
> --Barry
>
>
> Achilleus Mantzios wrote:
> > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> >
> > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > (which is most likely reloaded from a 7.2.x dump)
> > For instance, in my case all text data in my 7.2.3 were
> > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > I was not able to read these data correctly since the driver
> > assumed i stored them in utf-8.
> >
> > 2) When the contents of a varchar or text field are the
> > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> >
> > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > java.lang.ArrayIndexOutOfBoundsException
> > 2003-01-27 11:50:55,666 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > 2003-01-27 11:50:55,668 ERROR [STDERR]  at
> > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> >
> > In order to solve these 2 problems for my case , i.e. with no need
> > for unicode support i wrote this simple patch.
> > (Note this patch is usefull only for people who DONT NEED
> > multibyte support)
> > --------------------------cut here------------------------------
> > *** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
> > --- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
> > ***************
> > *** 372,382 ****
> >           //support is now always included
> >           if (haveMinimumServerVersion("7.3"))
> >           {
> >               java.sql.ResultSet acRset =
> > !                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> >
> >               //set encoding to be unicode
> > !             encoding = Encoding.getEncoding("UNICODE", null);
> >
> >               if (!acRset.next())
> >               {
> > --- 372,384 ----
> >           //support is now always included
> >           if (haveMinimumServerVersion("7.3"))
> >           {
> > + //            java.sql.ResultSet acRset =
> > + //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> >               java.sql.ResultSet acRset =
> > !                 ExecSQL("show autocommit");
> >
> >               //set encoding to be unicode
> > ! //            encoding = Encoding.getEncoding("UNICODE", null);
> >
> >               if (!acRset.next())
> >               {
> > -------------------cut here-------------------------------------------
> > ==================================================================
> > Achilleus Mantzios
> > S/W Engineer
> > IT dept
> > Dynacom Tankers Mngmt
> > Nikis 4, Glyfada
> > Athens 16610
> > Greece
> > tel:    +30-10-8981112
> > fax:    +30-10-8981877
> > email:  achill@matrix.gatewaynet.com
> >         mantzios@softlab.ece.ntua.gr
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Achilleus Mantzios
Date:
On Wed, 5 Feb 2003, Achilleus Mantzios wrote:

> On Tue, 4 Feb 2003, Barry Lind wrote:
>
> > Achilleus,
> >
> > What is the character set of your database?  My guess is that it is
> > SQLASCII which is a 7bit character set.  If you are storing ISO-8859-7
> > data you should have that as your database character set.  All reports
>
> Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
> If you read the code, you will see that the driver for all 7.3 versions
> forces UTF-8 client encoding.
>
> From AbstractJdbc1Connection.java i read:
>
> //We also set the client encoding so that the driver only needs
> //to deal with utf8.  We can only do this in 7.3 because multibyte
> //support is now always included
>
> So what happens is that the database converts from
> sqlascii -> utf-8 (client encoding),
> and then the driver from utf-8 -> Unicode (with line 164 in
> Encoding.java).
>
> So, if you store in the database the chars 0xA0 0x0A
> you have a test case!
> (the Encoding.decodeUTF8 method throws the indicated Exception).
>
> Dont be mislead by me saying that i had 8bit chars (greek)
> in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
> enter greek data eitherway).
>
> Now the real problems are
> a) Greek chars, mainly my fault but backwards compatibility problem.
>  In 7.2.3 the server returned SQL_ASCII chars, interpreted these
>  as greek UTF8 chars and returned valid greek java unicode strings
>  and everybody was happy.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Excuse me, i was wrong.
What happened is that i inserted in java, 8bit ASCII chars
(not greek UTF8), and data were stored as SQLASCII,
then in my jsp, i just read those ASCII chars, and because my
servlet container encoding was ISO-8859-1 no conversion was done,
and then because my page's charset was set to ISO-8859-7,
the browser displayed greek chars correctly.

>
>  Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
>  the problem
>
> b) NOT GREEK RELATED!
>  With database_encoding set to SQL_ASCII, the server converts these wierd
>  2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
>
> I think you should deal with problem b).
> To create a test case is easy.
> Create a SQL_ASCII database, then insert these 2 chars in a text column
> (having typed these two chars with some utility like khexedit),
> and then out.println this string.
>
>
> > of problems I have seen in this regards were because the database
> > character set didn't match the character set of the actual data.  This
> > is important because the jdbc driver needs to convert the data to java
> > unicode, and if the database character set is incorrectly defined it
> > cannot do this correctly.
> >
> > If this isn't your problem, please submit a test case that shows your
> > problem so that we can look into it.
> >
> > thanks,
> > --Barry
> >
> >
> > Achilleus Mantzios wrote:
> > > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> > >
> > > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > > (which is most likely reloaded from a 7.2.x dump)
> > > For instance, in my case all text data in my 7.2.3 were
> > > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > > I was not able to read these data correctly since the driver
> > > assumed i stored them in utf-8.
> > >
> > > 2) When the contents of a varchar or text field are the
> > > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> > >
> > > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > > java.lang.ArrayIndexOutOfBoundsException
> > > 2003-01-27 11:50:55,666 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > > 2003-01-27 11:50:55,668 ERROR [STDERR]  at
> > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> > >
> > > In order to solve these 2 problems for my case , i.e. with no need
> > > for unicode support i wrote this simple patch.
> > > (Note this patch is usefull only for people who DONT NEED
> > > multibyte support)
> > > --------------------------cut here------------------------------
> > > *** AbstractJdbc1Connection.java.orig    Tue Jan 28 09:42:54 2003
> > > --- AbstractJdbc1Connection.java    Tue Jan 28 09:50:09 2003
> > > ***************
> > > *** 372,382 ****
> > >           //support is now always included
> > >           if (haveMinimumServerVersion("7.3"))
> > >           {
> > >               java.sql.ResultSet acRset =
> > > !                 ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > >
> > >               //set encoding to be unicode
> > > !             encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > >               if (!acRset.next())
> > >               {
> > > --- 372,384 ----
> > >           //support is now always included
> > >           if (haveMinimumServerVersion("7.3"))
> > >           {
> > > + //            java.sql.ResultSet acRset =
> > > + //                ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > >               java.sql.ResultSet acRset =
> > > !                 ExecSQL("show autocommit");
> > >
> > >               //set encoding to be unicode
> > > ! //            encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > >               if (!acRset.next())
> > >               {
> > > -------------------cut here-------------------------------------------
> > > ==================================================================
> > > Achilleus Mantzios
> > > S/W Engineer
> > > IT dept
> > > Dynacom Tankers Mngmt
> > > Nikis 4, Glyfada
> > > Athens 16610
> > > Greece
> > > tel:    +30-10-8981112
> > > fax:    +30-10-8981877
> > > email:  achill@matrix.gatewaynet.com
> > >         mantzios@softlab.ece.ntua.gr
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Don't 'kill -9' the postmaster
> > >
> >
> >
> >
>
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-10-8981112
> fax:    +30-10-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Barry Lind
Date:

Achilleus Mantzios wrote:
> b) NOT GREEK RELATED!
>  With database_encoding set to SQL_ASCII, the server converts these wierd
>  2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
>
> I think you should deal with problem b).
> To create a test case is easy.
> Create a SQL_ASCII database, then insert these 2 chars in a text column
> (having typed these two chars with some utility like khexedit),
> and then out.println this string.
>

Achilleus,

I want to understand what you mean by 'deal with the problem'.  Since
0xA0 and 0x0A are invalid SQL_ASCII characters, the only thing I can
think of is to produce a better exception in this case.  So instead of
the current ArrayIndexOutOfBounds exception, this case would throw a SQL
Exception with a message something like:  "Invalid characters were
found.  This is most likely caused by stored data containing characters
that are invalid for the character set the database was created in.  The
most common example of this is storing 8bit data in a SQL_ASCII database."

thanks,
--Barry




Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Achilleus Mantzios
Date:
On Wed, 5 Feb 2003, Barry Lind wrote:

>
>
> Achilleus Mantzios wrote:
> > b) NOT GREEK RELATED!
> >  With database_encoding set to SQL_ASCII, the server converts these wierd
> >  2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
> >
> > I think you should deal with problem b).
> > To create a test case is easy.
> > Create a SQL_ASCII database, then insert these 2 chars in a text column
> > (having typed these two chars with some utility like khexedit),
> > and then out.println this string.
> >
>
> Achilleus,
>
> I want to understand what you mean by 'deal with the problem'.  Since

What i mean, is simply that either we dont allow these chars
to get inserted (setString methods maybe), and we let the
decodeUTF-8 method as is, or allow them to get inserted
and then convert them to the traditional '?' char.

Thanx

> 0xA0 and 0x0A are invalid SQL_ASCII characters, the only thing I can
> think of is to produce a better exception in this case.  So instead of
> the current ArrayIndexOutOfBounds exception, this case would throw a SQL
> Exception with a message something like:  "Invalid characters were
> found.  This is most likely caused by stored data containing characters
> that are invalid for the character set the database was created in.  The
> most common example of this is storing 8bit data in a SQL_ASCII database."
>
> thanks,
> --Barry
>
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From
Dave Cramer
Date:
This doesn't really solve the problem. The driver isn't the only way to
get information into the database. The driver should be able to handle
anything that it receives gracefully though

Dave
On Fri, 2003-02-07 at 06:56, Achilleus Mantzios wrote:
> On Wed, 5 Feb 2003, Barry Lind wrote:
>
> >
> >
> > Achilleus Mantzios wrote:
> > > b) NOT GREEK RELATED!
> > >  With database_encoding set to SQL_ASCII, the server converts these wierd
> > >  2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
> > >
> > > I think you should deal with problem b).
> > > To create a test case is easy.
> > > Create a SQL_ASCII database, then insert these 2 chars in a text column
> > > (having typed these two chars with some utility like khexedit),
> > > and then out.println this string.
> > >
> >
> > Achilleus,
> >
> > I want to understand what you mean by 'deal with the problem'.  Since
>
> What i mean, is simply that either we dont allow these chars
> to get inserted (setString methods maybe), and we let the
> decodeUTF-8 method as is, or allow them to get inserted
> and then convert them to the traditional '?' char.
>
> Thanx
>
> > 0xA0 and 0x0A are invalid SQL_ASCII characters, the only thing I can
> > think of is to produce a better exception in this case.  So instead of
> > the current ArrayIndexOutOfBounds exception, this case would throw a SQL
> > Exception with a message something like:  "Invalid characters were
> > found.  This is most likely caused by stored data containing characters
> > that are invalid for the character set the database was created in.  The
> > most common example of this is storing 8bit data in a SQL_ASCII database."
> >
> > thanks,
> > --Barry
> >
> >
> >
>
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel:    +30-10-8981112
> fax:    +30-10-8981877
> email:  achill@matrix.gatewaynet.com
>         mantzios@softlab.ece.ntua.gr
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Dave Cramer <Dave@micro-automation.net>


emacs behave like pgjindent?

From
Michael Adler
Date:
Has anyone crafted a mode-hook so that emacs behaves roughly like
pgjindent?  The default JDE style is quite different. If no one has done
the leg work, I may take a stab.

-Mike

Re: emacs behave like pgjindent?

From
Dave Cramer
Date:
You're welcome to go ahead, but I'm not sure how many folks use emacs. I
confess to using JBuilder, or vi

Dave
On Fri, 2003-02-07 at 08:50, Michael Adler wrote:
> Has anyone crafted a mode-hook so that emacs behaves roughly like
> pgjindent?  The default JDE style is quite different. If no one has done
> the leg work, I may take a stab.
>
> -Mike
--
Dave Cramer <Dave@micro-automation.net>


Re: emacs behave like pgjindent?

From
Michael Adler
Date:
If anyone's interested, this does a decent job. The difference I saw was
that emacs will still let a blank line have a tab on it. pgjindent will
trim it off with entab.

(defun my-jde-mode-hook()
  ;; attempt to match PostgreSQL's pgjindent style
  (setq tab-width 4)
  (setq indent-tabs-mode t)
  (c-set-offset 'substatement-open 0)
  )
(add-hook 'jde-mode-hook 'my-jde-mode-hook)



- Mike Adler

On Fri, 7 Feb 2003, Dave Cramer wrote:
> Date: 07 Feb 2003 09:20:03 -0500
> From: Dave Cramer <Dave@micro-automation.net>
> To: Michael Adler <adler@glimpser.org>
> Cc: "pgsql-jdbc@postgresql.org" <pgsql-jdbc@postgresql.org>,
>      Barry Lind <blind@xythos.com>
> Subject: Re: [JDBC] emacs behave like pgjindent?
>
> You're welcome to go ahead, but I'm not sure how many folks use emacs. I
> confess to using JBuilder, or vi
>
> Dave
> On Fri, 2003-02-07 at 08:50, Michael Adler wrote:
> > Has anyone crafted a mode-hook so that emacs behaves roughly like
> > pgjindent?  The default JDE style is quite different. If no one has done
> > the leg work, I may take a stab.
> >
> > -Mike
> --
> Dave Cramer <Dave@micro-automation.net>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


new class layout to support COPY protocal

From
Michael Adler
Date:
I'm working on supporting the COPY protocol (again). Unless people are
unsatisfied with the largeobject way of accessing pg-specific
functionality, I'll adopt their way of doing things. For example:

org.postgresql.copy.CopyManager copyMgr;
copyMgr = ((org.postgresql.PGConnection)con).getCopyAPI();
copyMgr.copyOut("tablename", outputStream);
copyMgr.copyIn("tablename", inputStream);

I have working code with unit tests, but it still needs polishing. I
simply wanted to know if this class layout would be met with approval.

- Mike Adler


patch for COPY

From
Michael Adler
Date:
I've attached a tar file that includes a context diff and two additional
files.  This should provide COPY capabilities for the JDBC driver. I can
write up the Docbook documentation if the patch (or some version of it) is
to be incorporated.

I will be unable to respond to comments next week, but I can respond this
weekend and the following weekend and thereafter.

- Mike Adler

Attachment

Re: patch for COPY

From
Kris Jurka
Date:
One of the failings of the copy protocol is that on error basically the
connection is hosed.  Is it possible to reset the connection state on
error for the user?

Also are there plans to support other elements of the COPY syntax?  For
example NULL AS, OIDS, and column lists.

Kris Jurka

On Fri, 7 Feb 2003, Michael Adler wrote:

>
> I've attached a tar file that includes a context diff and two additional
> files.  This should provide COPY capabilities for the JDBC driver. I can
> write up the Docbook documentation if the patch (or some version of it) is
> to be incorporated.
>
> I will be unable to respond to comments next week, but I can respond this
> weekend and the following weekend and thereafter.
>
> - Mike Adler


Re: patch for COPY

From
Michael Adler
Date:

On Fri, 7 Feb 2003, Kris Jurka wrote:
> One of the failings of the copy protocol is that on error basically the
> connection is hosed.  Is it possible to reset the connection state on
> error for the user?

Assuming the rest of the driver can support this behavior, I'm guess that
we should make this optional.

> Also are there plans to support other elements of the COPY syntax?  For
> example NULL AS, OIDS, and column lists.

Yes. My current thinking is to provide a method that takes an arbitrary
COPY command. This also gives us backwards compatibility since the command
syntax has changed from 7.2 to 7.3.

Mike Adler

Re: patch for COPY

From
Kris Jurka
Date:

On Sat, 8 Feb 2003, Michael Adler wrote:
>
> On Fri, 7 Feb 2003, Kris Jurka wrote:
> > One of the failings of the copy protocol is that on error basically the
> > connection is hosed.  Is it possible to reset the connection state on
> > error for the user?
>
> Assuming the rest of the driver can support this behavior, I'm guess that
> we should make this optional.

That's the question.  Can we reset the driver to a close enough state that
it is transparent to the user.  With normal JDBC access the user will
expect to get an SQLException call connection.rollback() and continue as
usual.  This could be tricky.

> > Also are there plans to support other elements of the COPY syntax?  For
> > example NULL AS, OIDS, and column lists.
>
> Yes. My current thinking is to provide a method that takes an arbitrary
> COPY command. This also gives us backwards compatibility since the command
> syntax has changed from 7.2 to 7.3.

What is the expected use case for a copyIn?  Is an InputStream a
reasonable means for input.  Would defining a CopyInputSource interface
for a user's class to implement be useful?  The JDBC driver could then
pull data directly from the user's representation without an intermediate
persistance to the InputStream.

Kris Jurka



Re: patch for COPY

From
Michael Adler
Date:
On Sun, 9 Feb 2003, Kris Jurka wrote:
>
> On Sat, 8 Feb 2003, Michael Adler wrote:
> >
> > On Fri, 7 Feb 2003, Kris Jurka wrote:
> > > One of the failings of the copy protocol is that on error basically the
> > > connection is hosed.  Is it possible to reset the connection state on
> > > error for the user?
> >
> > Assuming the rest of the driver can support this behavior, I'm guess that
> > we should make this optional.
>
> That's the question.  Can we reset the driver to a close enough state that
> it is transparent to the user.  With normal JDBC access the user will
> expect to get an SQLException call connection.rollback() and continue as
> usual.  This could be tricky.
>

If we take libpq as the standard for what's practical to acheive with the
FE/BE protocol, I don't think we'll be able to maintain much. libpq simply
closes and opens the connection. (following test with a 7.2 installation)

testdb=# set datestyle to German;
SET VARIABLE
testdb=# show datestyle;
NOTICE:  DateStyle is German with European conventions
SHOW VARIABLE
testdb=# \i isf
psql:isf:1: ERROR:  copy: line 1, pg_atoi: error in "T": can't parse "T"
psql:isf:1: lost synchronization with server, resetting connection
testdb=#
testdb=# show datestyle;
NOTICE:  DateStyle is ISO with US (NonEuropean) conventions
SHOW VARIABLE

I wonder if the best we can do is to establish a fresh connection and
begin a transaction. If they call rollback, it will rollback nothing, but
at least it behaves outwardly in a uniform fashion.

> What is the expected use case for a copyIn?  Is an InputStream a
> reasonable means for input.  Would defining a CopyInputSource interface
> for a user's class to implement be useful?  The JDBC driver could then
> pull data directly from the user's representation without an intermediate
> persistance to the InputStream.

For my needs, an InputStream is reasonable.

FileInputStream fis = new FileInputStream("dumpfile");
copyIn("destination_table", fis);

Whether someone else finds that insufficient is another matter.

Personally, I think that eschewing java.io would increase the complexity
of the driver without a demonstrated need for the functionality. It's
likely that I lack the imagination to see how useful such a feature would
be. I will leave the decision to someone with more experience on this
project.

If a user has particular needs and is concerned with memory footprint, I
would recommend the Piped(Input/Output)Streams.

Mike Adler

Re: patch for COPY

From
Tom Lane
Date:
Michael Adler <adler@glimpser.org> writes:
> On Fri, 7 Feb 2003, Kris Jurka wrote:
>>> One of the failings of the copy protocol is that on error basically the
>>> connection is hosed.  Is it possible to reset the connection state on
>>> error for the user?

> If we take libpq as the standard for what's practical to acheive with the
> FE/BE protocol, I don't think we'll be able to maintain much. libpq simply
> closes and opens the connection. (following test with a 7.2 installation)

It might be best to just leave this as an open problem until the COPY
protocol is fixed.  Making COPY able to recover from errors is one of
the "must fix" items for the next FE/BE protocol revision.  There had
been talk of doing this for 7.4, but given the lack of progress so far
I wouldn't want to promise results for 7.4.  Maybe 7.5 though.  We have
enough accumulated reasons for protocol changes that I think it's
getting to be a high-priority issue.

            regards, tom lane

revised patch for COPY

From
Michael Adler
Date:
Here's another version of a patch that gives you COPY capabilities. The
difference is that in addition to the simple and default:

copyOut("tablename",outputStream);

you can also access other COPY features by supplying your own COPY query:

copyOutQuery("COPY "+tablename+" WITH OID TO STDOUT DELIMITERS '\t' WITH
NULL AS '\N'",outputStream);

This feature speeds up my application 40x and I bet it will be useful to
others as well. I wrote it to integrate cleanly into the driver, so
please let me know if its not appropriate for the main project.

Comments?

Mike Adler

Attachment