Re: UNICODE encoding and jdbc related issues - Mailing list pgsql-jdbc

From Igor Postelnik
Subject Re: UNICODE encoding and jdbc related issues
Date
Msg-id 46F30BC04EC6364695BC07D4A57AAD2C7C7EAB@auscorpex-1.austin.messageone.com
Whole thread Raw
In response to UNICODE encoding and jdbc related issues  (Chris Kratz <chris.kratz@vistashare.com>)
Responses Re: UNICODE encoding and jdbc related issues
List pgsql-jdbc
> > 2. I'm really not sure I want to change the encoding of our main
> database to
> > Unicode.  Is there a performance loss when going to a UNICODE
database
> > encoding?  What about sorts, etc.  I'm really worried about
unintended
> side
> > effects of moving from SQL_ASCII to UNICODE.
>
> You don't need to use unicode, but you must select another encoding.
If
> you'd like to stick with a single byte encoding perhaps LATIN1 would
be
> appropriate for you.

I've asked this before on the performance list but didn't get any reply.
Is there substantial performance difference between using SQL_ASCII,
LATIN1, or UNICODE?

> The driver does SET client_encoding which does work for all real
server
> encodings.  The problem is that SQL_ASCII is not a real encoding.  It
> accepts any encoding and cannot do conversions to other encodings.
Your
> db right now could easily have a mix of encodings.

ISTM that when you create a database with SQL_ASCII encoding you decide
to deal with character set issues in the applications. Why is the JDBC
driver dictating how the application handles character set issues?

-Igor


> -----Original Message-----
> From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-
> owner@postgresql.org] On Behalf Of Kris Jurka
> Sent: Wednesday, April 06, 2005 1:23 PM
> To: Chris Kratz
> Cc: pgsql-jdbc@postgresql.org
> Subject: Re: [JDBC] UNICODE encoding and jdbc related issues
>
>
>
> On Wed, 6 Apr 2005, Chris Kratz wrote:
>
> > Our production database was created with the default SQL_ASCII
encoding.
> It
> > appears that some of our users have entered characters into the
system
> with
> > characters above 127 (accented vowels, etc).  None of the tools we
use
> > currently have had a problem with this behavior until recently,
> everything
> > just worked.
> >
> > I was testing some reporting tools this past weekend and have been
> playing
> > with Jasper reports[1] .  Jasper reports is a Java based reporting
tool
> that
> > reads data from the database via JDBC.  When I initially tried to
> generate
> > reports, the jdbc connection would crash with the following message:
> >
> > org.postgresql.util.PSQLException: Invalid character data was found.
> >
> > Googling eventually turned up a message on the pgsql-jdbc list
detailing
> the
> > problem[2].  Basically, java cannot convert these characters above
127
> into
> > unicode which is required by java.
> >
> > After some more googling, I found that if I took a recent database
dump
> and
> > then ran it through iconv[3] and then created the database with a
> unicode
> > encoding, everything worked.
> >
> > 1. Is there any way to do a iconv type translation inline in a sql
> statement?
> > ie select translate(text_field, unicode) from table....  Btw, set
> > client_encoding=UNICODE does not work in this situation.  In fact
the
> JDBC
> > driver for postgres seems to do this automatically.
>
> You can't do translation inline, how would a driver interpret the
results
> of SELECT translate(field1, unicode), translate(field2, latin1) ?
>
>
>
>
> > 3. Is there any other way around this issue?  Or are we living
> dangerously by
> > trying to store non-ascii data in a database created as ascii
encoded?
>
> You are living dangerously.
>
> > 4. Has anyone else gone through a conversion like this?  Are there
any
> >  gotchas we should look out for?
>
> The gotchas here are to make sure your other client tools still work
> against the new database.
>
> > [3] iconv -f iso8859-1 -t utf-8 < dbsnapshot.dumpall > dump-utf-
> 8.dumpall
>
> I see your data really is LATIN1.  Perhaps you should use that as your
db
> encoding.  That should keep your existing client tools happy as well
as
> the JDBC driver.
>
> Kris Jurka
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org



pgsql-jdbc by date:

Previous
From: Kris Jurka
Date:
Subject: Re: UNICODE encoding and jdbc related issues
Next
From: Kris Jurka
Date:
Subject: Re: UNICODE encoding and jdbc related issues