Thread: Character encoding problem

Character encoding problem

From
Boris Klug
Date:
Hello!

we are using PostgreSQL DBs with Java for quite a long time. Now we upgraded
to PostgreSQL 7.3.1 and now we have a problem:

We have to PostgreSQL dbs on two different servers, both with Debian Linux
2.4.18. Java is a SDK 1.4.

I created the dbs using standrd enconding with leads to SQL_ASCII encoding.
Now I created a table with a varchar field and inserted tet with German
umlauts:
    create table umlauttest (txt varchar(50));
    insert into umlauttest values('üäö ÜÄÖ ß');

using psql I can verify that the umlauts are correctly stored in the db. Using
pg73b1jdbc2 (build 104) I can retrieve the umlauts on one machine but only
get questions marks on the other. Why?
Using pg73rc1jdbc2 (build 106) I get an ArrayOurOfBounce error on both
machines. Using pg73jdbc2 (build 108) I get the following exception:
"Invalid character data was found.  This is most likely caused by stored data
containing characters that are invalid for the character set the database was
created in.  The most common example of this is storing 8bit data in a
SQL_ASCII database."

OK, so I created the db using "initdb -E LATIN1" which does not helped. Using
"initdb -E UNICODE" will give even more problems: I wasnt able to insert
umlauts using psql...

So I am now totally confused about char encoding in PostgreSQL and in jdbc...
Can you help?



--
Dipl. Inform. Boris Klug, control IT GmbH, Germany

Re: Character encoding problem

From
Barry Lind
Date:
Boris,

What problems do you have when your database encoding is latin1?  Can
you send a simple test program that demonstrates the problem?  I can
successfully store and retreive latin1 data here on my test system.

thanks,
--Barry

Boris Klug wrote:
> Hello!
>
> we are using PostgreSQL DBs with Java for quite a long time. Now we upgraded
> to PostgreSQL 7.3.1 and now we have a problem:
>
> We have to PostgreSQL dbs on two different servers, both with Debian Linux
> 2.4.18. Java is a SDK 1.4.
>
> I created the dbs using standrd enconding with leads to SQL_ASCII encoding.
> Now I created a table with a varchar field and inserted tet with German
> umlauts:
>     create table umlauttest (txt varchar(50));
>     insert into umlauttest values('üäö ÜÄÖ ß');
>
> using psql I can verify that the umlauts are correctly stored in the db. Using
> pg73b1jdbc2 (build 104) I can retrieve the umlauts on one machine but only
> get questions marks on the other. Why?
> Using pg73rc1jdbc2 (build 106) I get an ArrayOurOfBounce error on both
> machines. Using pg73jdbc2 (build 108) I get the following exception:
> "Invalid character data was found.  This is most likely caused by stored data
> containing characters that are invalid for the character set the database was
> created in.  The most common example of this is storing 8bit data in a
> SQL_ASCII database."
>
> OK, so I created the db using "initdb -E LATIN1" which does not helped. Using
> "initdb -E UNICODE" will give even more problems: I wasnt able to insert
> umlauts using psql...
>
> So I am now totally confused about char encoding in PostgreSQL and in jdbc...
> Can you help?
>
>
>



Re: Character encoding problem

From
Boris Klug
Date:
Hello!

> > OK, so I created the db using "initdb -E LATIN1" which does not helped.
> > Using "initdb -E UNICODE" will give even more problems: I wasnt able to
> > insert umlauts using psql...

OK, I retested the whole stuff, here a quick summay:

1) Using "initdb -E LATIN1" when creating a database.

Works great, that was the trick: I can now use pg73b1jdbc2, pg73rc1jdbc2 or
pg73jdbc2.


2) Using "initdb" (leads to encoding "SQL_ASCII":

I can retrieve umlauts when I use use the pg73b1jdbc2 driver with the option
"charSet=ISO_8859_1" in the JDBC connection url. That is what we used till
now - I think we have to switch to LATIN1 enconding when we migrate to 7.3.1
(from 7.2).

I was using PostgreSQL 7.3.1, running on Linux with JDK 1.4.

--
Dipl. Inform. Boris Klug, control IT GmbH, Germany