Thread: Again about charset encoding and accents

Again about charset encoding and accents

From
Davide Romanini
Date:
Hi,

I've worked a bit around the accents (àòè ...) problem with the jdbc driver.
The problem is that I've a working database with server encoding set to
SQL_ASCII (I cannot change it). The encoding at server side doesn't give
any problem because my accents are stored correctly and correctly they
are retrived from the database using any other software like psql and
the odbc driver. Problems come with the jdbc driver because if a row
contains accents (for example 'La città di Forlì') the driver doesn't
work (my example statement becomes 'La citt?di Forl?').
When I posted this problem, you've suggested to turn the database to
UNICODE, but I cannot do this, because it is a currently working
database! On the other hand, I do not WANT to do this because the driver
MUST work also on SQL_ASCII database just like the psql and the odbc
driver do!

So I played a bit with the source of the driver. I've tried different
solutions: using the charSet parameter in the connection string,
changing the CLIENT_ENCODING runtime variable in PostgreSQL, but none of
them worked. So I tried to step into the source and I've found that my
string gets dirty when the
org.postgresql.core.Encoding.decodeUTF8(byte[], int, int) method is
invoked by rs.getString.
Don't know exactly why it doesn't work, but my accents seem to be a
negative number in the byte[] array, so the method fails in some way doing
    l_cdata[j++] = (char)data[i];

because z < 0x80 (line 249-252). z is a negative number so it will be
always < 0x80!
Well, I'm not a Java guru so I don't know how to modify the source to
get it working. I've anyway solved my problem simply commenting the
method decodeUTF8 in a whole and returning as value a new String(data)
and magically it works perfectly :-).

I hope you solve the problem because, as I've already said in my first
post, it makes the driver absolutely unusable in serious work!

Bye, Romaz