Re: Character Encoding problem - Mailing list pgsql-jdbc

From antony baxter
Subject Re: Character Encoding problem
Date
Msg-id 3ee066b40804062034w338d5320s11df94cd126ab60e@mail.gmail.com
Whole thread Raw
In response to Character Encoding problem  ("antony baxter" <antony.baxter@gmail.com>)
Responses Re: Character Encoding problem  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-jdbc
One thing I forgot to add; I also tried e.g.:

  ps.setString(1, new
String(Charset.forName("UTF-8").encode(myString).array(), "UTF-8"));

to be absolutely certain that I was passing UTF-8 to the database; this threw a

22047 [Thread-2] DEBUG com.test.database.postgresql.Dao  - PSQL
Exception State: 22021
22047 [Thread-2] DEBUG com.test.database.postgresql.Dao  - PSQL
Exception Message: invalid byte sequence for encoding "UTF8": 0x00
22051 [Thread-2] ERROR com.test.database.postgresql.Dao  - Error Storing Data:
org.postgresql.util.PSQLException: ERROR: invalid byte sequence for
encoding "UTF8": 0x00
   at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592)
   at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327)
   at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192)
   at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451)
   at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:350)
   at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:343)
   at com.test.database.postgresql.Dao.store(Dao.java:197)
   ...

I presume that this is because the JDBC driver is expecting the JVM's
internal UTF16 String representation?

Ant



On Mon, Apr 7, 2008 at 8:29 AM, antony baxter <antony.baxter@gmail.com> wrote:
> Hi,
>
>  I'm having a character set problem, and I wonder if anyone here could
>  sanity check what I'm doing. It might well be that the problem lies
>  elsewhere.
>
>  My database was created with -E UNICODE, and when I do a \l in psql it
>  is listed as UTF8.
>
>  My Java application is receiving data over a socket which is encoded
>  in UTF8. I'm logging this and it is displaying e.g. Cyrillic or Greek
>  correctly (using OSX Terminal.app which supports UTF8, tailing the log
>  with 'less' and the environment variable LESSCHARSET=utf-8.
>
>  I'm persisting this data using the latest 8.3 JDBC drivers into
>  PostgreSQL 8.3.0. I'm not changing the client_encoding (I tried, but I
>  understand that the JDBC drivers set it to UNICODE anyway, and throw
>  an error if I attempt to change it to anything else). The data writes
>  fine, and if I then do a SELECT and a resultSet.getString(x) and write
>  the output to the log, everything still looks fine. I'm therefore
>  satisfied that Java + JDBC drivers + PostgreSQL are able to write &
>  read the data fine.  So far so good.
>
>  However, if using psql I try to look at the data, it is mangled. If I
>  try a manual UPDATE via psql using the data cut'n'pasted from my log,
>  and then look at the data, it reads correctly. Therefore I know that
>  psql is capable of reading and writing UTF8 data correctly. Also, the
>  client application that reads from my database is Perl, and this also
>  retrieves mangled data; we've tried writing and reading directly from
>  Perl, and in this case reviewing the data in psql looks normal.
>
>  The conclusion I've reached is that Java + JDBC is not actually
>  persisting the data in UTF-8; is that correct or am I wildly off base,
>  and if it is correct then is there anything I can do about it?!
>
>  Many thanks,
>
>  Ant.
>

pgsql-jdbc by date:

Previous
From: "antony baxter"
Date:
Subject: Character Encoding problem
Next
From: Craig Ringer
Date:
Subject: Re: Character Encoding problem