Re: Problem with accessing Russian UTF database - Mailing list pgsql-jdbc

From Oliver Jowett
Subject Re: Problem with accessing Russian UTF database
Date
Msg-id 492C85C2.8040602@opencloud.com
Whole thread Raw
In response to Problem with accessing Russian UTF database  ("Ronald Vyhmeister" <rvyhmeister@gmail.com>)
List pgsql-jdbc
Ronald Vyhmeister wrote:
> I'm having real trouble with the jdbc driver for postgres... I just
> installed the latest version...
>
> I have a database, UTF8 encoded, which has data in Russian.  I can view it
> beautifully using PGAdmin3 or any other ODBC connection.

Perhaps these connections are not actually using UTF8 to interpret the
data, but some other encoding - so while they appear to write encoded
data then retrieve it OK, it's not actually what you think it is when
interpreted as UTF8?

> String URLdb =
> "jdbc:postgresql://127.0.0.1:5432/oldzautest?user=noe&password=genesis&charS
> et=UNICODE";

You should not need "charSet=UNICODE", though I don't think it'll break
anything.

> <data>
>         <db_content>
>             <row>
>                     <contents content = "1"  />
>                     <contents content = "1"  />
>                     <contents content = "?????"  />
>                     <contents content = "????????"  />
>                     <contents content = "?????????"  />
>                     <contents content = "1965-03-10"  />
>                     <contents content = "1"  />
>             </row>
>         </db_content>
> </data>

Perhaps the problem is in the encoding you are using to write out that
XML fragment? Or in whatever tool you are using to view it?

> I've set the client_encoding to UTF8 on the server...  What am I doing
> wrong?  What am I missing?  I'd be thrilled to interact privately with
> someone who has solved what for now is a mystery to me.

You shouldn't need to touch client_encoding for JDBC to work (though
other clients might need it). The JDBC driver forces client_encoding to
UTF8 anyway on connection startup.

It may be useful to examine the actual value of the characters in the
String objects you are dealing with (i.e. print out (int)s.charAt(0)
etc) to check they contain the unicode codepoints you were expecting.

In general the driver "just works" with UTF-8 encoded databases. It's
dealing in terms of Unicode strings internally, so the only transcoding
that goes on is from UTF-8 to UTF-16, which is lossless. All the
reported problems we've seen in the (recent) past with this
configuration have been either problems with non-JDBC clients getting
confused, or problems with how the resulting String was displayed to the
user, or having non-unicode garbage stored in the database in the first
place.

-O

pgsql-jdbc by date:

Previous
From: "Ronald Vyhmeister"
Date:
Subject: Problem with accessing Russian UTF database
Next
From: Oliver Jowett
Date:
Subject: Re: Problem with accessing Russian UTF database