Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars? - Mailing list pgsql-jdbc

From Barry Lind
Subject Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?
Date
Msg-id 3AF30113.5030109@xythos.com
Whole thread Raw
In response to A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?  (Jani Averbach <jaa@cc.jyu.fi>)
Responses Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-jdbc
Tom,

I don't consider it a 'uselessly obstrucionist policy' for the client to
use the encoding the server says it is using :-)  The jdbc code simply
issues a 'select getdatabaseencoding()' and uses the value the server
tells it to.  I would place the blame more on the server for lying to
the client :-)

I consider this a problem with the backend in that it requires multibyte
support to be enabled to handle supporting even single byte character
sets like LATIN1.  (True it supports LATIN1 without multibyte, but it
doesn't correctly report to the client what character set the server is
using, so the client has know way of knowing if it should use LATIN1,
LATIN2, or KOI8-R -- the character set of the data is an important piece
of information for a client especially in java where some encoding needs
to be used to convert to ucs2).

Now it is an easy change in the jdbc code to use LATIN1 when the server
reports SQL_ASCII, but I really dislike hardcoding support that only
works in english speaking countries and Western Europe.  All this does
is move the problem from being one that non-english countries have to
being one where it is a non-english and non-western european problem
(eg. Eastern Europe, Russia, etc.).

In the current jdbc code it is possible to override the character set
that is being used (by passing a 'charSet' parameter to the connection),
so it is possible to use a different encoding than the database is
reporting.

from Connection.java:
     //Set the encoding for this connection
     //Since the encoding could be specified or obtained from the DB we
use the
     //following order:
     //  1.  passed as a property
     //  2.  value from DB if supported by current JVM
     //  3.  default for JVM (leave encoding null)

thanks,
--Barry


Tom Lane wrote:

> Tony Grant <tony@animaproductions.com> writes:
>
>> On 04 May 2001 10:29:50 -0400, Tom Lane wrote:
>>
>>> Does this happen with a non-multibyte-compiled database?  If so, I'd
>>> argue that's a serious bug in the JDBC code: it makes JDBC unusable
>>> for non-ASCII 8-bit character sets, unless one puts up with the overhead
>>> of MULTIBYTE support.
>>
>> I fought with this for a few days. The solution is to dump the database
>> and create a new database with the correct encoding.
>
>> MULTIBYTE is not neccesary I just set the type to LATIN1 and it works
>> fine.
>
>
> But a non-MULTIBYTE backend doesn't even have the concept of "setting
> the encoding" --- it will always just report SQL_ASCII.
>
> Perhaps what this really says is that it'd be better if the JDBC code
> assumed LATIN1 translations when the backend claims SQL_ASCII.
> Certainly, translating all high-bit-set characters to '?' is about as
> uselessly obstructionist a policy as I can think of...
>
>             regards, tom lane
>
>


pgsql-jdbc by date:

Previous
From: Marko Kreen
Date:
Subject: Re: Bug-report (was: JDBC driver in pgsql 7.1 build problem)
Next
From: Ned Wolpert
Date:
Subject: Re: Bug-report (was: JDBC driver in pgsql 7.1 build problem)