Re: Character Encoding problem - Mailing list pgsql-jdbc
From | antony baxter |
---|---|
Subject | Re: Character Encoding problem |
Date | |
Msg-id | 3ee066b40804062034w338d5320s11df94cd126ab60e@mail.gmail.com Whole thread Raw |
In response to | Character Encoding problem ("antony baxter" <antony.baxter@gmail.com>) |
Responses |
Re: Character Encoding problem
(Craig Ringer <craig@postnewspapers.com.au>)
|
List | pgsql-jdbc |
One thing I forgot to add; I also tried e.g.: ps.setString(1, new String(Charset.forName("UTF-8").encode(myString).array(), "UTF-8")); to be absolutely certain that I was passing UTF-8 to the database; this threw a 22047 [Thread-2] DEBUG com.test.database.postgresql.Dao - PSQL Exception State: 22021 22047 [Thread-2] DEBUG com.test.database.postgresql.Dao - PSQL Exception Message: invalid byte sequence for encoding "UTF8": 0x00 22051 [Thread-2] ERROR com.test.database.postgresql.Dao - Error Storing Data: org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x00 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1592) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1327) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:192) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:350) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:343) at com.test.database.postgresql.Dao.store(Dao.java:197) ... I presume that this is because the JDBC driver is expecting the JVM's internal UTF16 String representation? Ant On Mon, Apr 7, 2008 at 8:29 AM, antony baxter <antony.baxter@gmail.com> wrote: > Hi, > > I'm having a character set problem, and I wonder if anyone here could > sanity check what I'm doing. It might well be that the problem lies > elsewhere. > > My database was created with -E UNICODE, and when I do a \l in psql it > is listed as UTF8. > > My Java application is receiving data over a socket which is encoded > in UTF8. I'm logging this and it is displaying e.g. Cyrillic or Greek > correctly (using OSX Terminal.app which supports UTF8, tailing the log > with 'less' and the environment variable LESSCHARSET=utf-8. > > I'm persisting this data using the latest 8.3 JDBC drivers into > PostgreSQL 8.3.0. I'm not changing the client_encoding (I tried, but I > understand that the JDBC drivers set it to UNICODE anyway, and throw > an error if I attempt to change it to anything else). The data writes > fine, and if I then do a SELECT and a resultSet.getString(x) and write > the output to the log, everything still looks fine. I'm therefore > satisfied that Java + JDBC drivers + PostgreSQL are able to write & > read the data fine. So far so good. > > However, if using psql I try to look at the data, it is mangled. If I > try a manual UPDATE via psql using the data cut'n'pasted from my log, > and then look at the data, it reads correctly. Therefore I know that > psql is capable of reading and writing UTF8 data correctly. Also, the > client application that reads from my database is Perl, and this also > retrieves mangled data; we've tried writing and reading directly from > Perl, and in this case reviewing the data in psql looks normal. > > The conclusion I've reached is that Java + JDBC is not actually > persisting the data in UTF-8; is that correct or am I wildly off base, > and if it is correct then is there anything I can do about it?! > > Many thanks, > > Ant. >
pgsql-jdbc by date: