Thread: using 8 bit ascii

using 8 bit ascii

From
"Jason Tesser"
Date:
I have a Postgres database (version 7.4.2) that is using acsii character
233 which is an 8
bit ascii character. I also use jboss.  My problem is when I try to
retrieve
a resultset that has a record with one of the 8bit ascii characters I
get a
message from jboss (see error message below.

My question is there a way to configure the postgres jdbc driver to
allow
for this range of characters?

2004-10-26 16:54:51,167 ERROR [STDERR]
org.postgresql.util.PSQLException: Invalid character data was found.
This is most likely caused by stored data containing characters that are
invalid for the character set the database was created in.  The most
common example of this is storing 8bit data in a SQL_ASCII database.
2004-10-26 16:54:51,167 ERROR [STDERR]  at
org.postgresql.core.Encoding.decodeUTF8(Encoding.java:287)
2004-10-26 16:54:51,167 ERROR [STDERR]  at
org.postgresql.core.Encoding.decode(Encoding.java:182)
2004-10-26 16:54:51,167 ERROR [STDERR]  at
org.postgresql.core.Encoding.decode(Encoding.java:198)
2004-10-26 16:54:51,167 ERROR [STDERR]  at
org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
tSet.java:201)
2004-10-26 16:54:51,167 ERROR [STDERR]  at
org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
tSet.java:475)
2004-10-26 16:54:51,168 ERROR [STDERR]  at
payroll.DeptWorkers.loadWorkers(DeptWorkers.java:52)
2004-10-26 16:54:51,168 ERROR [STDERR]  at
org.apache.jsp.manager_jsp._jspService(manager_jsp.jav


Re: using 8 bit ascii

From
Anders Hermansen
Date:
Hello Jason,

ASCII is only 7-bit. Values 0 to 127.

ISO-8859-1 is an example of a character set with 8-bits (0 to 255).
233 is é in ISO-8859-1 (Latin-1).

You should create the database with an encoding which can handle 8-bit
characters. I.e. ISO-8859-1 (Postgresql: Latin-1) or UTF-8 (Postgresql:
UNICODE)


Anders

* Jason Tesser (JTesser@nbbc.edu) wrote:
> I have a Postgres database (version 7.4.2) that is using acsii character
> 233 which is an 8
> bit ascii character. I also use jboss.  My problem is when I try to
> retrieve
> a resultset that has a record with one of the 8bit ascii characters I
> get a
> message from jboss (see error message below.
>
> My question is there a way to configure the postgres jdbc driver to
> allow
> for this range of characters?
>
> 2004-10-26 16:54:51,167 ERROR [STDERR]
> org.postgresql.util.PSQLException: Invalid character data was found.
> This is most likely caused by stored data containing characters that are
> invalid for the character set the database was created in.  The most
> common example of this is storing 8bit data in a SQL_ASCII database.
> 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decodeUTF8(Encoding.java:287)
> 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decode(Encoding.java:182)
> 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> org.postgresql.core.Encoding.decode(Encoding.java:198)
> 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
> tSet.java:201)
> 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
> tSet.java:475)
> 2004-10-26 16:54:51,168 ERROR [STDERR]  at
> payroll.DeptWorkers.loadWorkers(DeptWorkers.java:52)
> 2004-10-26 16:54:51,168 ERROR [STDERR]  at
> org.apache.jsp.manager_jsp._jspService(manager_jsp.jav
>

Re: using 8 bit ascii

From
"Jason Tesser"
Date:
OK I tried the Unicode but the data won't come in as it says it cannot support the Unicode values I am inserting.  I
triedconverting the data as a text file and everything.  Nothing has worked there.  With odbc using access for example
Ican pull the 8 bit characters out just fine from the same database.  So why can I not using postgres jdbc? 
I understand that ascii is 7 bit but these are extended ascii.  I will try Latin 1

> -----Original Message-----
> From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-
> owner@postgresql.org] On Behalf Of Anders Hermansen
> Sent: Wednesday, October 27, 2004 8:12 AM
> To: pgsql-jdbc@postgresql.org
> Subject: Re: [JDBC] using 8 bit ascii
>
> Hello Jason,
>
> ASCII is only 7-bit. Values 0 to 127.
>
> ISO-8859-1 is an example of a character set with 8-bits (0 to 255).
> 233 is é in ISO-8859-1 (Latin-1).
>
> You should create the database with an encoding which can handle 8-bit
> characters. I.e. ISO-8859-1 (Postgresql: Latin-1) or UTF-8 (Postgresql:
> UNICODE)
>
>
> Anders
>
> * Jason Tesser (JTesser@nbbc.edu) wrote:
> > I have a Postgres database (version 7.4.2) that is using acsii character
> > 233 which is an 8
> > bit ascii character. I also use jboss.  My problem is when I try to
> > retrieve
> > a resultset that has a record with one of the 8bit ascii characters I
> > get a
> > message from jboss (see error message below.
> >
> > My question is there a way to configure the postgres jdbc driver to
> > allow
> > for this range of characters?
> >
> > 2004-10-26 16:54:51,167 ERROR [STDERR]
> > org.postgresql.util.PSQLException: Invalid character data was found.
> > This is most likely caused by stored data containing characters that are
> > invalid for the character set the database was created in.  The most
> > common example of this is storing 8bit data in a SQL_ASCII database.
> > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:287)
> > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decode(Encoding.java:182)
> > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > org.postgresql.core.Encoding.decode(Encoding.java:198)
> > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
> > tSet.java:201)
> > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
> > tSet.java:475)
> > 2004-10-26 16:54:51,168 ERROR [STDERR]  at
> > payroll.DeptWorkers.loadWorkers(DeptWorkers.java:52)
> > 2004-10-26 16:54:51,168 ERROR [STDERR]  at
> > org.apache.jsp.manager_jsp._jspService(manager_jsp.jav
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly



Re: using 8 bit ascii

From
Anders Hermansen
Date:
Is it JDBC driver or another interface that says it cannot support the
Unicode values when you insert? What is the exact error message you get?

If you use for example ODBC and insert Latin-1 characters in a Unicode
db, things will go wrong. You can issue the following statement:
SET CLIENT_ENCODING TO 'LATIN1';
This will tell postgresql to expect latin1 characters. Postgresql will
then automatic convert to correct character set if necessary.

The JDBC driver will always operate in UNICODE mode, so it should not
have any problems with either Latin1 nor unicode databases.

I use JDBC driver with both Latin1 and Unicode databases with no
problems. I also use psql for some scripts, but I have ISO-8859-1
terminal, so I execute the above query first.
(Actually I have "\encoding LATIN1" in my .psqlrc file).


Anders

* Jason Tesser (JTesser@nbbc.edu) wrote:
> OK I tried the Unicode but the data won't come in as it says it cannot support the Unicode values I am inserting.  I
triedconverting the data as a text file and everything.  Nothing has worked there.  With odbc using access for example
Ican pull the 8 bit characters out just fine from the same database.  So why can I not using postgres jdbc? 
> I understand that ascii is 7 bit but these are extended ascii.  I will try Latin 1
>
> > -----Original Message-----
> > From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-
> > owner@postgresql.org] On Behalf Of Anders Hermansen
> > Sent: Wednesday, October 27, 2004 8:12 AM
> > To: pgsql-jdbc@postgresql.org
> > Subject: Re: [JDBC] using 8 bit ascii
> >
> > Hello Jason,
> >
> > ASCII is only 7-bit. Values 0 to 127.
> >
> > ISO-8859-1 is an example of a character set with 8-bits (0 to 255).
> > 233 is é in ISO-8859-1 (Latin-1).
> >
> > You should create the database with an encoding which can handle 8-bit
> > characters. I.e. ISO-8859-1 (Postgresql: Latin-1) or UTF-8 (Postgresql:
> > UNICODE)
> >
> >
> > Anders
> >
> > * Jason Tesser (JTesser@nbbc.edu) wrote:
> > > I have a Postgres database (version 7.4.2) that is using acsii character
> > > 233 which is an 8
> > > bit ascii character. I also use jboss.  My problem is when I try to
> > > retrieve
> > > a resultset that has a record with one of the 8bit ascii characters I
> > > get a
> > > message from jboss (see error message below.
> > >
> > > My question is there a way to configure the postgres jdbc driver to
> > > allow
> > > for this range of characters?
> > >
> > > 2004-10-26 16:54:51,167 ERROR [STDERR]
> > > org.postgresql.util.PSQLException: Invalid character data was found.
> > > This is most likely caused by stored data containing characters that are
> > > invalid for the character set the database was created in.  The most
> > > common example of this is storing 8bit data in a SQL_ASCII database.
> > > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:287)
> > > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decode(Encoding.java:182)
> > > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > > org.postgresql.core.Encoding.decode(Encoding.java:198)
> > > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
> > > tSet.java:201)
> > > 2004-10-26 16:54:51,167 ERROR [STDERR]  at
> > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1Resul
> > > tSet.java:475)
> > > 2004-10-26 16:54:51,168 ERROR [STDERR]  at
> > > payroll.DeptWorkers.loadWorkers(DeptWorkers.java:52)
> > > 2004-10-26 16:54:51,168 ERROR [STDERR]  at
> > > org.apache.jsp.manager_jsp._jspService(manager_jsp.jav
> > >
> >

Re: using 8 bit ascii

From
Oliver Jowett
Date:
Jason Tesser wrote:

> 2004-10-26 16:54:51,167 ERROR [STDERR]
> org.postgresql.util.PSQLException: Invalid character data was found.
> This is most likely caused by stored data containing characters that are
> invalid for the character set the database was created in.  The most
> common example of this is storing 8bit data in a SQL_ASCII database.

As the error says, this problem usually arises from storing 8 bit data
in a SQL_ASCII database..

The JDBC driver always sets client_encoding = UNICODE and expects the
data arriving from the server to be UTF8 ("unicode") encoded. When you
have a SQL_ASCII database, the server has no information as to how to
translate characters above 127 into corresponding unicode values, so it
just passes them straight out. Then JDBC complains about invalid unicode
  sequences.

It's not just a case of somehow making the JDBC driver accept those
sequences; the driver really does need them translated to unicode as
Java's internal string format uses a unicode representation. To do this
translation, you need information about the actual encoding the data is
using. For post-7.2 servers, the JDBC driver chooses to let the server
deal with this, so you need to get the encoding information right on the
database side.

So you will need to recreate your database using an appropriate encoding
that reflects the data stored in it. Presumably those high-ascii
sequences already in the database are *not* unicode, they're probably
ISO-8859-1 or something similar? In that case you can probably dump&load
into database created with the LATIN1 encoding.

-O