UNICODE encoding and jdbc related issues - Mailing list pgsql-jdbc

From Chris Kratz
Subject UNICODE encoding and jdbc related issues
Date
Msg-id 200504061318.44803.chris.kratz@vistashare.com
Whole thread Raw
Responses Re: UNICODE encoding and jdbc related issues
List pgsql-jdbc
Hello All,

I posted this on the general mailing list several days ago without a response
so am posting here. Does anyone here have any wisdom or experience they don't
mind sharing?

Our production database was created with the default SQL_ASCII encoding.  It
appears that some of our users have entered characters into the system with
characters above 127 (accented vowels, etc).  None of the tools we use
currently have had a problem with this behavior until recently, everything
just worked.

I was testing some reporting tools this past weekend and have been playing
with Jasper reports[1] .  Jasper reports is a Java based reporting tool that
reads data from the database via JDBC.  When I initially tried to generate
reports, the jdbc connection would crash with the following message:

org.postgresql.util.PSQLException: Invalid character data was found.

Googling eventually turned up a message on the pgsql-jdbc list detailing the
problem[2].  Basically, java cannot convert these characters above 127 into
unicode which is required by java.

After some more googling, I found that if I took a recent database dump and
then ran it through iconv[3] and then created the database with a unicode
encoding, everything worked.

1. Is there any way to do a iconv type translation inline in a sql statement?
ie select translate(text_field, unicode) from table....  Btw, set
client_encoding=UNICODE does not work in this situation.  In fact the JDBC
driver for postgres seems to do this automatically.

2. I'm really not sure I want to change the encoding of our main database to
Unicode.  Is there a performance loss when going to a UNICODE database
encoding?  What about sorts, etc.  I'm really worried about unintended side
effects of moving from SQL_ASCII to UNICODE.

3. Is there any other way around this issue?  Or are we living dangerously by
trying to store non-ascii data in a database created as ascii encoded?

4. Has anyone else gone through a conversion like this?  Are there any
 gotchas we should look out for?

Thanks for your time,

-Chris

We are using postgres 7.4.5 on Linux.

[1] http://jasperreports.sourceforge.net/
[2] http://archives.postgresql.org/pgsql-jdbc/2004-10/msg00280.php
[3] iconv -f iso8859-1 -t utf-8 < dbsnapshot.dumpall > dump-utf-8.dumpall
--
Chris Kratz

pgsql-jdbc by date:

Previous
From: John R Pierce
Date:
Subject: Re: An I/O error occured while sending to the backend
Next
From: Scott Marlowe
Date:
Subject: Re: An I/O error occured while sending to the backend