Thread: A bug fix for JDBC's getTables() in Postgresql 7.1

A bug fix for JDBC's getTables() in Postgresql 7.1

From

Panu Outinen

Date:

03 May 2001, 14:03:36

Hi !!

I was trying to get a very nice FREE graphical db tool called DbVisualizer
(http://www.ideit.com/products/dbvis/) to work with Postgresql and I found
out the following bug: if database has views then getTables() gets the null
pointer exception ('order by relname' makes the listing tree in
DbVisualizer a lot useful !!)

This patch should propably be applied to the the jdbc1's
DatabaseMetaData.java, too.

[/tmp/postgresql-7.1/src/interfaces/jdbc/org/postgresql/jdbc2]$
<ql/jdbc2]$ diff -u DatabaseMetaData.java.org DatabaseMetaData.java

--- DatabaseMetaData.java.org    Wed May 02 22:52:25 2001
+++ DatabaseMetaData.java    Wed May 02 23:07:19 2001
@@ -1666,7 +1666,7 @@
      // Now take the pattern into account
      sql.append(") and relname like '");
      sql.append(tableNamePattern.toLowerCase());
-    sql.append("'");
+    sql.append("' order by relname");

      // Now run the query
      r = connection.ExecSQL(sql.toString());
@@ -1697,6 +1697,9 @@
      case 'S':
          relKind = "SEQUENCE";
          break;
+    case 'v':
+        relKind = "VIEW";
+        break;
      default:
          relKind = null;
      }
@@ -1704,7 +1707,7 @@
      tuple[0] = null;        // Catalog name
      tuple[1] = null;        // Schema name
      tuple[2] = r.getBytes(1);    // Table name
-    tuple[3] = relKind.getBytes();    // Table type
+    tuple[3] = (relKind==null) ? null : relKind.getBytes();    // Table type
      tuple[4] = remarks;        // Remarks
      v.addElement(tuple);
        }


-----
http://www.ideit.com/products/dbvis/

...

DbVisualizer
Version: 2.0
Released: 2001-04-20


The #1 requested feature to ease editing table data is now supported!
The #2 requested feature to print graphs is now supported!
Read the complete change log for all new features and enhancements!


DbVisualizer is a cross platform database visualization and edit tool
relying 100% on the JDBC, Java Database Connectivity API's. DbVisualizer
enables simultaneous connections to many different databases through JDBC
drivers available from a variety of vendors. Just point and click to browse
the structure of the database, characteristics of tables, etc. No matter if
it's an enterprise database from Oracle or an open source product like
InstantDB!

And best of all -> it's FREE!
-----

A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Jani Averbach

Date:

03 May 2001, 16:04:34

Hi!

I have a problem like that:

Environment:

The database is postgresql v7.1, with locale-support
jdk is sun's jdk1.3.0_02
and jdbc is that one which comes with postgres (type 2).

Both database and jdbc driver has been build by myself.

OS: RH 6.2 based linux with 2.4.3 kernel, glibc 2.1.3.

The problem:

There is a database which contains fields (the field's type is 'text')
with scandinavian alphabet. (Especially ÖÄÅöäå (odiaeresis, adiaeresis,
aring, or in other words, oe, ae, and a with ring above it)).


The database has been installed, created and used under
LC_ALL="finnish" and LANG="fi_FI" environment variables in act.

Ok, the problem:

When I try to read those field, I get guestion marks instead of those
8-bit scandic chars.

I have been check my java programs and the database. (in fact, same
problem appears with postgres-7.1/src/interfaces/jdbc/example/psql.java).
In general, my java environment works fine with 8-bit chars, with psgl
(not the java one) there is everything very well with those fields with
8-bit chars.

So my question is, am I doing something wrong, or is there a bug in the
pgsql-jdbc?

If this is a bug or you need otherwise help or more information, please
let me know. I will try to help as much as possible to hunt this one
down.

If I am doing something stupid, I would very likely to know it...

BR, Jani

---
Jani Averbach

Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Barry Lind

Date:

03 May 2001, 20:10:53

Since Java uses unicode (ucs2) internally for all Strings, the jdbc code
always needs to do character set conversions for all strings it gets
from the database.  In the 7.0 drivers this was done by assuming the
database was using the same character set as the default on the client,
which is incorrect for a number of reasons.  In 7.1 the jdbc code asks
the database what character set it is using and does the conversion from
the server character set to the java unicode strings.

Now it turns out that Postgres is a little lax in its character set
support, so you can very easily insert char/varchar/text with values
that fall outside the range of valid values for a given character set
(and psql doesn't really care either).  However in java since we must do
character set conversion to unicode, it does make a difference and any
values that were inserted that are incorrect with regards to the
database character set will be reported as ?'s in java.

With regards to your specific problem, my guess is that you haven't
created you database with the proper character set for the data you are
storing in it.  I am guessing you simply used the default SQL Acsii
character set for your created database and therefore only the first 127
characters are defined.  Any characters above 127 will be returned by
java as ?'s.

If this is the case you will need to recreate your database with the
proper character set for the data you are storing in it and then
everything should be fine.

thanks,
--Barry

Jani Averbach wrote:

> Hi!
>
> I have a problem like that:
>
> Environment:
>
> The database is postgresql v7.1, with locale-support
> jdk is sun's jdk1.3.0_02
> and jdbc is that one which comes with postgres (type 2).
>
> Both database and jdbc driver has been build by myself.
>
> OS: RH 6.2 based linux with 2.4.3 kernel, glibc 2.1.3.
>
> The problem:
>
> There is a database which contains fields (the field's type is 'text')
> with scandinavian alphabet. (Especially ÖÄÅöäå (odiaeresis, adiaeresis,
> aring, or in other words, oe, ae, and a with ring above it)).
>
>
> The database has been installed, created and used under
> LC_ALL="finnish" and LANG="fi_FI" environment variables in act.
>
> Ok, the problem:
>
> When I try to read those field, I get guestion marks instead of those
> 8-bit scandic chars.
>
> I have been check my java programs and the database. (in fact, same
> problem appears with postgres-7.1/src/interfaces/jdbc/example/psql.java).
> In general, my java environment works fine with 8-bit chars, with psgl
> (not the java one) there is everything very well with those fields with
> 8-bit chars.
>
> So my question is, am I doing something wrong, or is there a bug in the
> pgsql-jdbc?
>
> If this is a bug or you need otherwise help or more information, please
> let me know. I will try to help as much as possible to hunt this one
> down.
>
> If I am doing something stupid, I would very likely to know it...
>
> BR, Jani
>
> ---
> Jani Averbach
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
>

Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Jani Averbach

Date:

04 May 2001, 06:43:12

On Thu, 3 May 2001, Barry Lind wrote:

> With regards to your specific problem, my guess is that you haven't
> created you database with the proper character set for the data you are
> storing in it.  I am guessing you simply used the default SQL Acsii
> character set for your created database and therefore only the first 127
> characters are defined.  Any characters above 127 will be returned by
> java as ?'s.
>
> If this is the case you will need to recreate your database with the
> proper character set for the data you are storing in it and then
> everything should be fine.
>

Thanks, you are right!

The main problem was that I had not enabled the multibyte support for
database. (I believe fairytale and supposed that correct locale
setting will be enough.)

So my humble wish is that the instructions in the INSTALL file should be
corrected.
Because:

     --enable-multibyte

          Allows the use of multibyte character encodings. This is
          primarily for languages like Japanese, Korean, and Chinese. Read
          the Administrator's Guide for details.

I think that this is a little bit missleading.

There is correct information in the Administrator's Guide, so I should
have to read the Guide, but but... The world would be much better place,
if there is little mention in the installation instruction that this
concerns also 8-bit chars...

But anyway, it works now very fine, thanks!

BR, Jani

---
Jani Averbach

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tom Lane

Date:

04 May 2001, 10:53:50

Barry Lind <barry@xythos.com> writes:
> With regards to your specific problem, my guess is that you haven't
> created you database with the proper character set for the data you are
> storing in it.  I am guessing you simply used the default SQL Acsii
> character set for your created database and therefore only the first 127
> characters are defined.  Any characters above 127 will be returned by
> java as ?'s.

Does this happen with a non-multibyte-compiled database?  If so, I'd
argue that's a serious bug in the JDBC code: it makes JDBC unusable
for non-ASCII 8-bit character sets, unless one puts up with the overhead
of MULTIBYTE support.

            regards, tom lane

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tom Lane

Date:

04 May 2001, 12:08:27

Tony Grant <tony@animaproductions.com> writes:
> On 04 May 2001 10:29:50 -0400, Tom Lane wrote:
>> Does this happen with a non-multibyte-compiled database?  If so, I'd
>> argue that's a serious bug in the JDBC code: it makes JDBC unusable
>> for non-ASCII 8-bit character sets, unless one puts up with the overhead
>> of MULTIBYTE support.

> I fought with this for a few days. The solution is to dump the database
> and create a new database with the correct encoding.

> MULTIBYTE is not neccesary I just set the type to LATIN1 and it works
> fine.

But a non-MULTIBYTE backend doesn't even have the concept of "setting
the encoding" --- it will always just report SQL_ASCII.

Perhaps what this really says is that it'd be better if the JDBC code
assumed LATIN1 translations when the backend claims SQL_ASCII.
Certainly, translating all high-bit-set characters to '?' is about as
uselessly obstructionist a policy as I can think of...

            regards, tom lane

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tony Grant

Date:

04 May 2001, 12:24:27

On 04 May 2001 11:40:48 -0400, Tom Lane wrote:

> > I fought with this for a few days. The solution is to dump the database
> > and create a new database with the correct encoding.
>
> > MULTIBYTE is not neccesary I just set the type to LATIN1 and it works
> > fine.
>
> But a non-MULTIBYTE backend doesn't even have the concept of "setting
> the encoding" --- it will always just report SQL_ASCII.

OK I just read the configure script for my backend - you guessed it
multibyte support and locale support compiled in there... So createdb -E
LATIN1 works just fine =:-b

> Perhaps what this really says is that it'd be better if the JDBC code
> assumed LATIN1 translations when the backend claims SQL_ASCII.
> Certainly, translating all high-bit-set characters to '?' is about as
> uselessly obstructionist a policy as I can think of...


I will be adding this snippet to my doc on techdocs in the French
version. It will save somebody a lot of head scratching.

Cheers
Tony Grant


--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tony Grant

Date:

04 May 2001, 12:24:48

On 04 May 2001 10:29:50 -0400, Tom Lane wrote:

> > With regards to your specific problem, my guess is that you haven't
> > created you database with the proper character set for the data you are
> > storing in it.  I am guessing you simply used the default SQL Acsii
> > character set for your created database and therefore only the first 127
> > characters are defined.  Any characters above 127 will be returned by
> > java as ?'s.
>
> Does this happen with a non-multibyte-compiled database?  If so, I'd
> argue that's a serious bug in the JDBC code: it makes JDBC unusable
> for non-ASCII 8-bit character sets, unless one puts up with the overhead
> of MULTIBYTE support.

I fought with this for a few days. The solution is to dump the database
and create a new database with the correct encoding.

MULTIBYTE is not neccesary I just set the type to LATIN1 and it works
fine.

Queries even work on accentuated caracters!!!

I have a demo database for those interested

Cheers

Tony Grant



--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html
Ultradev and PostgreSQL
http://www.animaproductions.com/ultra.html

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tony Grant

Date:

04 May 2001, 12:30:23

On 04 May 2001 11:40:48 -0400, Tom Lane wrote:

>
> But a non-MULTIBYTE backend doesn't even have the concept of "setting
> the encoding" --- it will always just report SQL_ASCII.

What kind of error message does "createdb -E LATIN1" give on a non
MULTIBYTE backend?

Maybe there needs to be a note somewhere informing people from Europe
that they too need MULTIBYTE as an option at compile time. i.e. In a
bright yellow box in the HTML docs...

And in the Reference manual and man pages the -E option for createdb
needs a note to specify that it applies to MULTIBYTE backends only.

Cheers

Tony Grant

--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tom Lane

Date:

04 May 2001, 13:04:12

Tony Grant <tony@animaproductions.com> writes:
> What kind of error message does "createdb -E LATIN1" give on a non
> MULTIBYTE backend?

$ createdb -E LATIN1 foo
/home/postgres/testversion/bin/createdb[143]: /home/postgres/testversion/bin/pg_encoding:  not found.
createdb: "LATIN1" is not a valid encoding name
$

> Maybe there needs to be a note somewhere informing people from Europe
> that they too need MULTIBYTE as an option at compile time. i.e. In a
> bright yellow box in the HTML docs...

But they *should not* need it, if they only want to use an 8-bit character
set.  Locale support should be enough.  Or so I would think, anyway.
I have to admit I have not looked very closely at the functionality
that's enabled by MULTIBYTE; is any of it really needed to deal with
LATINn character sets?

            regards, tom lane

Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Barry Lind

Date:

04 May 2001, 15:57:22


Tony Grant wrote:

> On 04 May 2001 11:40:48 -0400, Tom Lane wrote:
>
>> But a non-MULTIBYTE backend doesn't even have the concept of "setting
>> the encoding" --- it will always just report SQL_ASCII.
>
>
> What kind of error message does "createdb -E LATIN1" give on a non
> MULTIBYTE backend?
>
> Maybe there needs to be a note somewhere informing people from Europe
> that they too need MULTIBYTE as an option at compile time. i.e. In a
> bright yellow box in the HTML docs...
>
> And in the Reference manual and man pages the -E option for createdb
> needs a note to specify that it applies to MULTIBYTE backends only.
>
> Cheers
>
> Tony Grant
>
The errors you get are:
from createdb-

$ createdb -E LATIN1 testdb
/usr/local/pgsql/bin/createdb: /usr/local/pgsql/bin/pg_encoding: No such
file or directory
createdb: "LATIN1" is not a valid encoding name

and from psql-

template1=# create database testdb with encoding = 'LATIN1';
ERROR:  Multi-byte support is not enabled

thanks,
--Barry

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Barry Lind

Date:

04 May 2001, 17:34:26

Tom,

I don't consider it a 'uselessly obstrucionist policy' for the client to
use the encoding the server says it is using :-)  The jdbc code simply
issues a 'select getdatabaseencoding()' and uses the value the server
tells it to.  I would place the blame more on the server for lying to
the client :-)

I consider this a problem with the backend in that it requires multibyte
support to be enabled to handle supporting even single byte character
sets like LATIN1.  (True it supports LATIN1 without multibyte, but it
doesn't correctly report to the client what character set the server is
using, so the client has know way of knowing if it should use LATIN1,
LATIN2, or KOI8-R -- the character set of the data is an important piece
of information for a client especially in java where some encoding needs
to be used to convert to ucs2).

Now it is an easy change in the jdbc code to use LATIN1 when the server
reports SQL_ASCII, but I really dislike hardcoding support that only
works in english speaking countries and Western Europe.  All this does
is move the problem from being one that non-english countries have to
being one where it is a non-english and non-western european problem
(eg. Eastern Europe, Russia, etc.).

In the current jdbc code it is possible to override the character set
that is being used (by passing a 'charSet' parameter to the connection),
so it is possible to use a different encoding than the database is
reporting.

from Connection.java:
     //Set the encoding for this connection
     //Since the encoding could be specified or obtained from the DB we
use the
     //following order:
     //  1.  passed as a property
     //  2.  value from DB if supported by current JVM
     //  3.  default for JVM (leave encoding null)

thanks,
--Barry

Tom Lane wrote:

> Tony Grant <tony@animaproductions.com> writes:
>
>> On 04 May 2001 10:29:50 -0400, Tom Lane wrote:
>>
>>> Does this happen with a non-multibyte-compiled database?  If so, I'd
>>> argue that's a serious bug in the JDBC code: it makes JDBC unusable
>>> for non-ASCII 8-bit character sets, unless one puts up with the overhead
>>> of MULTIBYTE support.
>>
>> I fought with this for a few days. The solution is to dump the database
>> and create a new database with the correct encoding.
>
>> MULTIBYTE is not neccesary I just set the type to LATIN1 and it works
>> fine.
>
>
> But a non-MULTIBYTE backend doesn't even have the concept of "setting
> the encoding" --- it will always just report SQL_ASCII.
>
> Perhaps what this really says is that it'd be better if the JDBC code
> assumed LATIN1 translations when the backend claims SQL_ASCII.
> Certainly, translating all high-bit-set characters to '?' is about as
> uselessly obstructionist a policy as I can think of...
>
>             regards, tom lane
>
>

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tom Lane

Date:

04 May 2001, 17:42:44

Barry Lind <barry@xythos.com> writes:
> Now it is an easy change in the jdbc code to use LATIN1 when the server
> reports SQL_ASCII, but I really dislike hardcoding support that only
> works in english speaking countries and Western Europe.

What's wrong with that?  It won't be any more broken for people who are
not really using LATIN1, and it will be considerably less broken for
those who are.  Seems like a net win to me, even without making the
obvious point about where the majority of Postgres users are.

It probably would be a good idea to allow the backend to store an
indication of character set even when not compiled for MULTIBYTE,
but that's not the issue here.  To me, the issue is whether JDBC
makes a reasonable effort not to munge data when presented with
a backend that claims to be using SQL_ASCII (which, let me remind
you, is the default setting).  Converting high-bit-set characters
to '?' is almost certainly NOT what the user wants you to do.
Converting on the assumption of LATIN1 will make a lot of people
happy, and the people who aren't happy with it will certainly not
be happy with '?' conversion either.

> All this does
> is move the problem from being one that non-english countries have to
> being one where it is a non-english and non-western european problem
> (eg. Eastern Europe, Russia, etc.).

Nonsense.  The non-Western-European folks see broken behavior now
anyway, unless they compile with MULTIBYTE and set an appropriate
encoding.  How would this make their lives worse, or even different?

I'm merely suggesting that the default behavior could be made useful
to a larger set of people than it now is, without making things any
worse for those that it's not useful to.

        regards, tom lane

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Barry Lind

Date:

04 May 2001, 21:27:25

I can see that I'm probably not going to win this argument, but I'll
take one more try. :-)

The basic issue I have it that the server is providing an API to the
client to get the character encoding for the database and that API can
report incorrect information to the client.

If multibyte isn't enabled, getdatabaseencoding() always returns
SQL_ASCII.  In my understanding SQL_ASCII =  7bit ascii (at least that
is what the code in backend/utils/mb/conv.c is assuming).  But in
reality SQL_ASCII means some unknown single byte character encoding.
But if multibyte is enabled then SQL_ASCII means 7bit ascii.  And as far
as I know there is no way for the client to know if multibyte is enabled
or not.

Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
something similar when in fact it doesn't know what the encoding is
(i.e. when not compiled with multibyte).  That way users of this
function on the client have a means of knowing does the server mean 7bit
ascii or not.  (Alternatively, having some other fuction like
getmultibyteenabled(Y/N) would work as well, because using that value
you can then determine whether or not to trust the value of
getdatabaseencoding).

I just don't like having an api that under some circumstances you can't
rely on its returned value as being correct.

thanks,
--Barry

PS.  Note that if multibyte is enabled, the functionality that is being
complained about here in the jdbc client is apparently ok for the server
to do.  If you insert a value into a text column on a SQL_ASCII database
with multibyte enabled and that value contains 8bit characters, those
8bit characters will be quietly replaced with a dummy character since
they are invalid for the SQL_ASCII 7bit character set.

Tom Lane wrote:

> Barry Lind <barry@xythos.com> writes:
>
>> Now it is an easy change in the jdbc code to use LATIN1 when the server
>> reports SQL_ASCII, but I really dislike hardcoding support that only
>> works in english speaking countries and Western Europe.
>
>
> What's wrong with that?  It won't be any more broken for people who are
> not really using LATIN1, and it will be considerably less broken for
> those who are.  Seems like a net win to me, even without making the
> obvious point about where the majority of Postgres users are.
>
> It probably would be a good idea to allow the backend to store an
> indication of character set even when not compiled for MULTIBYTE,
> but that's not the issue here.  To me, the issue is whether JDBC
> makes a reasonable effort not to munge data when presented with
> a backend that claims to be using SQL_ASCII (which, let me remind
> you, is the default setting).  Converting high-bit-set characters
> to '?' is almost certainly NOT what the user wants you to do.
> Converting on the assumption of LATIN1 will make a lot of people
> happy, and the people who aren't happy with it will certainly not
> be happy with '?' conversion either.
>
>> All this does
>> is move the problem from being one that non-english countries have to
>> being one where it is a non-english and non-western european problem
>> (eg. Eastern Europe, Russia, etc.).
>
>
> Nonsense.  The non-Western-European folks see broken behavior now
> anyway, unless they compile with MULTIBYTE and set an appropriate
> encoding.  How would this make their lives worse, or even different?
>
> I'm merely suggesting that the default behavior could be made useful
> to a larger set of people than it now is, without making things any
> worse for those that it's not useful to.
>
>         regards, tom lane
>
>

MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Tom Lane

Date:

05 May 2001, 11:36:22

[ thread renamed and cross-posted to pghackers, since this isn't only
about JDBC anymore ]

Barry Lind <barry@xythos.com> writes:
> The basic issue I have it that the server is providing an API to the
> client to get the character encoding for the database and that API can
> report incorrect information to the client.

I don't have any objection to changing the system so that even a
non-MULTIBYTE server can store and return encoding settings.
(Presumably it should only accept encoding settings that correspond
to single-byte encodings.)  That can't happen before 7.2, however,
as the necessary changes are a bit larger than I'd care to shoehorn
into a 7.1.* release.

> Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
> something similar when in fact it doesn't know what the encoding is
> (i.e. when not compiled with multibyte).

I have a philosophical difference with this: basically, I think that
since SQL_ASCII is the default value, you probably ought to assume that
it's not too trustworthy.  The software can *never* be said to KNOW what
the data encoding is; at most it knows what it's been told, and in the
case of a default it probably hasn't been told anything.  I'd argue that
SQL_ASCII should be interpreted in the way you are saying "UNKNOWN"
ought to be: ie, it's an unspecified 8-bit encoding (and from there
it's not much of a jump to deciding to treat it as LATIN1, if you're
forced to do conversion to Unicode or whatever).  Certainly, seeing
SQL_ASCII from the server is not license to throw away data, which is
what JDBC is doing now.

> PS.  Note that if multibyte is enabled, the functionality that is being
> complained about here in the jdbc client is apparently ok for the server
> to do.  If you insert a value into a text column on a SQL_ASCII database
> with multibyte enabled and that value contains 8bit characters, those
> 8bit characters will be quietly replaced with a dummy character since
> they are invalid for the SQL_ASCII 7bit character set.

I have not tried it, but if the backend does that then I'd argue that
that's a bug too.  To my mind, a MULTIBYTE backend operating in
SQL_ASCII encoding ought to behave the same as a non-MULTIBYTE backend:
transparent pass-through of characters with the high bit set.  But I'm
not a multibyte guru.  Comments anyone?

            regards, tom lane

Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

From

Tony Grant

Date:

07 May 2001, 04:57:51

On 04 May 2001 15:44:23 -0400, Tom Lane wrote:

Back from the weekend with sunburn (very important sign that it has stopped
raining here on the west of Europe!!!!)

>
> > All this does
> > is move the problem from being one that non-english countries have to
> > being one where it is a non-english and non-western european problem
> > (eg. Eastern Europe, Russia, etc.).
>
> Nonsense.  The non-Western-European folks see broken behavior now
> anyway, unless they compile with MULTIBYTE and set an appropriate
> encoding.  How would this make their lives worse, or even different?
>
> I'm merely suggesting that the default behavior could be made useful
> to a larger set of people than it now is, without making things any
> worse for those that it's not useful to.

This reminds me of e-mail software when I joined the net. 7 bit ASCII
only software made the use of accents impossible so we learnt to type
without them or put up with garbage in our mail.

I must agree with Tom here. There is a 256 caracter alphabet which is
standard in many languages. For North America, Spanish and French spring
to mind. How are you going to build a common market if these two
languages plus Brasilian Portugese are not supported in business
software?

Multibyte is supported for other alphabets. This is already a wonderfull
achievement for those concerned.

The standard backend should in my opinion support the LATIN alphabet. US
ASCII is a subset of that alphabet, it is not _the_ alphabet.

The JDBC and Java itself should also support the whole alphabet. All
this should be transparent for the programmer and the end user. Another
battle to be fought...

Cheers

Tony Grant

--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html
Macromedia UltraDev with PostgreSQL
http://www.animaproductions.com/ultra.html

Building JDBC in 7.1

From

Tony Grant

Date:

07 May 2001, 10:34:50

Hello,

For those having problems compiling the 7.1 JDBC driver the following
snippet from the Apache site may be a clue. I got the 7.1 .jar file to
build by setting JAVA_HOME and ANT_HOME before doing make


> You have to set the enviroment variable JAVA_HOME. It must
> point to your local JDK root directory. This is true, even if you use
> JDK 1.2 or above, which normally don't need this setting. It is
> used by Ant, the compilation software.

Cheers

Tony Grant

--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html
Macromedia UltraDev with PostgreSQL
http://www.animaproductions.com/ultra.html

Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Tatsuo Ishii

Date:

07 May 2001, 10:38:18

> > Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
> > something similar when in fact it doesn't know what the encoding is
> > (i.e. when not compiled with multibyte).

Is that ok for Java? I thought Java needs to know the encoding
beforehand so that it could convert to/from Unicode.

> I have a philosophical difference with this: basically, I think that
> since SQL_ASCII is the default value, you probably ought to assume that
> it's not too trustworthy.  The software can *never* be said to KNOW what
> the data encoding is; at most it knows what it's been told, and in the
> case of a default it probably hasn't been told anything.  I'd argue that
> SQL_ASCII should be interpreted in the way you are saying "UNKNOWN"
> ought to be: ie, it's an unspecified 8-bit encoding (and from there
> it's not much of a jump to deciding to treat it as LATIN1, if you're
> forced to do conversion to Unicode or whatever).  Certainly, seeing
> SQL_ASCII from the server is not license to throw away data, which is
> what JDBC is doing now.
>
> > PS.  Note that if multibyte is enabled, the functionality that is being
> > complained about here in the jdbc client is apparently ok for the server
> > to do.  If you insert a value into a text column on a SQL_ASCII database
> > with multibyte enabled and that value contains 8bit characters, those
> > 8bit characters will be quietly replaced with a dummy character since
> > they are invalid for the SQL_ASCII 7bit character set.
>
> I have not tried it, but if the backend does that then I'd argue that
> that's a bug too.

I suspect the JDBC driver is responsible for the problem Burry has
reported (I have tried to reproduce the problem using psql without
success).

From interfaces/jdbc/org/postgresql/Connection.java:

>        if (dbEncoding.equals("SQL_ASCII")) {
>          dbEncoding = "ASCII";

BTW, even if the backend behaves like that, I don't think it's a
bug. Since SQL_ASCII is nothing more than an ascii encoding.

> To my mind, a MULTIBYTE backend operating in
> SQL_ASCII encoding ought to behave the same as a non-MULTIBYTE backend:
> transparent pass-through of characters with the high bit set.  But I'm
> not a multibyte guru.  Comments anyone?

If you expect that behavior, I think the encoding name 'UNKNOWN' or
something like that seems more appropreate. (SQL_)ASCII is just an
ascii IMHO.
--
Tatsuo Ishii

Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Barry Lind

Date:

07 May 2001, 21:42:13

Tatsuo Ishii wrote:

>>> Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
>>> something similar when in fact it doesn't know what the encoding is
>>> (i.e. when not compiled with multibyte).
>>
>
> Is that ok for Java? I thought Java needs to know the encoding
> beforehand so that it could convert to/from Unicode.

That is actually the original issue that started this thread.  If you
want the full thread see the jdbc mail archive list.  A user was
complaining that when running on a database without multibyte enabled,
that through psql he could insert and retrieve 8bit characters, but in
jdbc the 8bit characters were converted to ?'s.

I then explained why this was happening (db returns SQL_ASCII as the db
character set when not compiled with multibyte) so that character set is
used to convert to unicode.

Tom suggested that it would make more sense for jdbc to use LATIN1 when
the database reported SQL_ASCII so that most users will see 'correct'
behavior in a non multibyte database.  Because currently you need to
enable multibyte support in order to use 8bit characters with jdbc.
Jdbc could easily be changed to treat SQL_ASCII as LATIN1, but I don't
feel that is an appropriate solution for the reasons outlined in this
thread (thus the suggestions for UNKNOWN, or the ability for the client
to determine if multibyte is enabled or not).

>
>> I have a philosophical difference with this: basically, I think that
>> since SQL_ASCII is the default value, you probably ought to assume that
>> it's not too trustworthy.  The software can *never* be said to KNOW what
>> the data encoding is; at most it knows what it's been told, and in the
>> case of a default it probably hasn't been told anything.  I'd argue that
>> SQL_ASCII should be interpreted in the way you are saying "UNKNOWN"
>> ought to be: ie, it's an unspecified 8-bit encoding (and from there
>> it's not much of a jump to deciding to treat it as LATIN1, if you're
>> forced to do conversion to Unicode or whatever).  Certainly, seeing
>> SQL_ASCII from the server is not license to throw away data, which is
>> what JDBC is doing now.
>>
>>> PS.  Note that if multibyte is enabled, the functionality that is being
>>> complained about here in the jdbc client is apparently ok for the server
>>> to do.  If you insert a value into a text column on a SQL_ASCII database
>>> with multibyte enabled and that value contains 8bit characters, those
>>> 8bit characters will be quietly replaced with a dummy character since
>>> they are invalid for the SQL_ASCII 7bit character set.
>>
>> I have not tried it, but if the backend does that then I'd argue that
>> that's a bug too.
>
>
> I suspect the JDBC driver is responsible for the problem Burry has
> reported (I have tried to reproduce the problem using psql without
> success).
>
> >From interfaces/jdbc/org/postgresql/Connection.java:
>
>>        if (dbEncoding.equals("SQL_ASCII")) {
>>          dbEncoding = "ASCII";
>
>
> BTW, even if the backend behaves like that, I don't think it's a
> bug. Since SQL_ASCII is nothing more than an ascii encoding.

I believe Tom's point is that if multibyte is not enabled this isn't
true, since SQL_ASCII then means whatever character set the client wants
to use against the server as the server really doesn't care what single
byte data is being inserted/selected from the database.

>
>> To my mind, a MULTIBYTE backend operating in
>> SQL_ASCII encoding ought to behave the same as a non-MULTIBYTE backend:
>> transparent pass-through of characters with the high bit set.  But I'm
>> not a multibyte guru.  Comments anyone?
>
>
> If you expect that behavior, I think the encoding name 'UNKNOWN' or
> something like that seems more appropreate. (SQL_)ASCII is just an
> ascii IMHO.

I agree.

>
> --
> Tatsuo Ishii
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
--Barry

Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Barry Lind

Date:

08 May 2001, 01:17:16

Tatsuo Ishii wrote:

>>>>> Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
>>>>> something similar when in fact it doesn't know what the encoding is
>>>>> (i.e. when not compiled with multibyte).
>>>>
>>> Is that ok for Java? I thought Java needs to know the encoding
>>> beforehand so that it could convert to/from Unicode.
>>
>> That is actually the original issue that started this thread.  If you
>> want the full thread see the jdbc mail archive list.  A user was
>> complaining that when running on a database without multibyte enabled,
>> that through psql he could insert and retrieve 8bit characters, but in
>> jdbc the 8bit characters were converted to ?'s.
>
>
> Still I don't see what you are wanting in the JDBC driver if
> PostgreSQL would return "UNKNOWN" indicating that the backend is not
> compiled with MULTIBYTE. Do you want exact the same behavior as prior
> 7.1 driver? i.e. reading data from the PostgreSQL backend, assume its
> encoding default to the Java client (that is set by locale or
> something else) and convert it to UTF-8. If so, that would make sense
> to me...

My suggestion would be that if the jdbc client was able to determine if
the server character set was UNKNOWN (i.e. no multibyte) that it would
then use some appropriate default character set to perform conversions
to UCS2 (LATIN1 would probably make the most sence as a default).  The
jdbc driver would perform its existing behavior if the character set was
SQL_ASCII and multibyte was enabled (i.e. only support 7bit characters
just like the backend does).

Note that the user is always able to override the character set used for
conversion by setting the charSet property.

Tom also mentioned that it might be possible for the server to support
setting the character set for a database even when multibyte wasn't
enabled.  That would then allow clients like jdbc to get a value from
non-multibyte enabled servers that would be more meaningful than the
current SQL_ASCII.  If this where done, then the 'UNKNOWN' hack would
not be necessary.

thanks,
--Barry

>
> --
> Tatsuo Ishii
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Tom Lane

Date:

08 May 2001, 22:40:56

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Tom also mentioned that it might be possible for the server to support
>> setting the character set for a database even when multibyte wasn't
>> enabled.  That would then allow clients like jdbc to get a value from
>> non-multibyte enabled servers that would be more meaningful than the
>> current SQL_ASCII.  If this where done, then the 'UNKNOWN' hack would
>> not be necessary.

> Tom's suggestion does not sound reasonable to me. If PostgreSQL is not
> built with MULTIBYTE, then it means there would be no such idea
> "encoding" in PostgreSQL becuase there is no program to handle
> encodings. Thus it would be meaningless to assign an "encoding" to a
> database if MULTIBYTE is not enabled.

Why?  Without the MULTIBYTE code, the backend cannot perform character
set translations --- but it's perfectly possible that someone might not
need translations.  A lot of European sites are probably very happy
as long as the server gives them back the same 8-bit characters they
stored.  But what they would like, if they have to deal with tools like
JDBC, is to *identify* what character set they are storing data in, so
that their data will be correctly translated to Unicode or whatever.
The obvious way to do that is to allow them to set the value that
getdatabaseencoding() will return.

Essentially, my point is that identifying the character set is useful
to support outside-the-database character set conversions, whether or
not we have compiled the code for inside-the-database conversions.
Moreover, the stored data certainly has some encoding, whether or not
the database contains code that knows enough to do anything useful about
the encoding.  So it's not "meaningless" to be able to store and report
an encoding value.

I am not sure how much of the MULTIBYTE code would have to be activated
to allow this, but surely it's only a small fraction of the complete
feature.

            regards, tom lane

Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Tatsuo Ishii

Date:

11 May 2001, 09:20:14

> > Still I don't see what you are wanting in the JDBC driver if
> > PostgreSQL would return "UNKNOWN" indicating that the backend is not
> > compiled with MULTIBYTE. Do you want exact the same behavior as prior
> > 7.1 driver? i.e. reading data from the PostgreSQL backend, assume its
> > encoding default to the Java client (that is set by locale or
> > something else) and convert it to UTF-8. If so, that would make sense
> > to me...
>
> My suggestion would be that if the jdbc client was able to determine if
> the server character set was UNKNOWN (i.e. no multibyte) that it would
> then use some appropriate default character set to perform conversions
> to UCS2 (LATIN1 would probably make the most sence as a default).  The
> jdbc driver would perform its existing behavior if the character set was
> SQL_ASCII and multibyte was enabled (i.e. only support 7bit characters
> just like the backend does).
>
> Note that the user is always able to override the character set used for
> conversion by setting the charSet property.

I see.  However I would say we could not change the current behavior
of the backend until 7.2 is out. It is our policy the we would not
add/change existing functionalities while we are in the minor release
cycle.

What about doing like this:

1. call pg_encoding_to_char(1)    (actually any number except 0 is ok)

2. if it returns "SQL_ASCII", then you could assume that MULTIBYTE is
not enbaled.

This is pretty ugly, but should work.

> Tom also mentioned that it might be possible for the server to support
> setting the character set for a database even when multibyte wasn't
> enabled.  That would then allow clients like jdbc to get a value from
> non-multibyte enabled servers that would be more meaningful than the
> current SQL_ASCII.  If this where done, then the 'UNKNOWN' hack would
> not be necessary.

Tom's suggestion does not sound reasonable to me. If PostgreSQL is not
built with MULTIBYTE, then it means there would be no such idea
"encoding" in PostgreSQL becuase there is no program to handle
encodings. Thus it would be meaningless to assign an "encoding" to a
database if MULTIBYTE is not enabled.
--
Tatsuo Ishii

Re: [HACKERS] MULTIBYTE and SQL_ASCII (was Re: Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?)

From

Tatsuo Ishii

Date:

11 May 2001, 09:28:25

> >>> Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or
> >>> something similar when in fact it doesn't know what the encoding is
> >>> (i.e. when not compiled with multibyte).
> >>
> >
> > Is that ok for Java? I thought Java needs to know the encoding
> > beforehand so that it could convert to/from Unicode.
>
> That is actually the original issue that started this thread.  If you
> want the full thread see the jdbc mail archive list.  A user was
> complaining that when running on a database without multibyte enabled,
> that through psql he could insert and retrieve 8bit characters, but in
> jdbc the 8bit characters were converted to ?'s.

Still I don't see what you are wanting in the JDBC driver if
PostgreSQL would return "UNKNOWN" indicating that the backend is not
compiled with MULTIBYTE. Do you want exact the same behavior as prior
7.1 driver? i.e. reading data from the PostgreSQL backend, assume its
encoding default to the Java client (that is set by locale or
something else) and convert it to UTF-8. If so, that would make sense
to me...
--
Tatsuo Ishii

Re: A bug fix for JDBC's getTables() in Postgresql 7.1

From

Bruce Momjian

Date:

16 May 2001, 12:42:23

Thanks.  Patch applied to jdbc1 and jdbc2, and attached.  I had already
patched the ORDER BY as:

    sql.append("' order by relkind, relname");


> Hi !!
>
> I was trying to get a very nice FREE graphical db tool called DbVisualizer
> (http://www.ideit.com/products/dbvis/) to work with Postgresql and I found
> out the following bug: if database has views then getTables() gets the null
> pointer exception ('order by relname' makes the listing tree in
> DbVisualizer a lot useful !!)
>
> This patch should propably be applied to the the jdbc1's
> DatabaseMetaData.java, too.
>
> [/tmp/postgresql-7.1/src/interfaces/jdbc/org/postgresql/jdbc2]$
> <ql/jdbc2]$ diff -u DatabaseMetaData.java.org DatabaseMetaData.java
>
> --- DatabaseMetaData.java.org    Wed May 02 22:52:25 2001
> +++ DatabaseMetaData.java    Wed May 02 23:07:19 2001
> @@ -1666,7 +1666,7 @@
>       // Now take the pattern into account
>       sql.append(") and relname like '");
>       sql.append(tableNamePattern.toLowerCase());
> -    sql.append("'");
> +    sql.append("' order by relname");
>
>       // Now run the query
>       r = connection.ExecSQL(sql.toString());
> @@ -1697,6 +1697,9 @@
>       case 'S':
>           relKind = "SEQUENCE";
>           break;
> +    case 'v':
> +        relKind = "VIEW";
> +        break;
>       default:
>           relKind = null;
>       }
> @@ -1704,7 +1707,7 @@
>       tuple[0] = null;        // Catalog name
>       tuple[1] = null;        // Schema name
>       tuple[2] = r.getBytes(1);    // Table name
> -    tuple[3] = relKind.getBytes();    // Table type
> +    tuple[3] = (relKind==null) ? null : relKind.getBytes();    // Table type
>       tuple[4] = remarks;        // Remarks
>       v.addElement(tuple);
>         }
>
>
> -----
> http://www.ideit.com/products/dbvis/
>
> ...
>
> DbVisualizer
> Version: 2.0
> Released: 2001-04-20
>
>
> The #1 requested feature to ease editing table data is now supported!
> The #2 requested feature to print graphs is now supported!
> Read the complete change log for all new features and enhancements!
>
>
> DbVisualizer is a cross platform database visualization and edit tool
> relying 100% on the JDBC, Java Database Connectivity API's. DbVisualizer
> enables simultaneous connections to many different databases through JDBC
> drivers available from a variety of vendors. Just point and click to browse
> the structure of the database, characteristics of tables, etc. No matter if
> it's an enterprise database from Oracle or an open source product like
> InstantDB!
>
> And best of all -> it's FREE!
> -----
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: src/interfaces/jdbc/org/postgresql/jdbc1/DatabaseMetaData.java
===================================================================
RCS file: /home/projects/pgsql/cvsroot/pgsql/src/interfaces/jdbc/org/postgresql/jdbc1/DatabaseMetaData.java,v
retrieving revision 1.14
diff -c -r1.14 DatabaseMetaData.java
*** src/interfaces/jdbc/org/postgresql/jdbc1/DatabaseMetaData.java    2001/05/16 04:08:49    1.14
--- src/interfaces/jdbc/org/postgresql/jdbc1/DatabaseMetaData.java    2001/05/16 16:36:21
***************
*** 1697,1702 ****
--- 1697,1705 ----
      case 'S':
          relKind = "SEQUENCE";
          break;
+     case 'v':
+         relKind = "VIEW";
+         break;
      default:
          relKind = null;
      }
***************
*** 1704,1710 ****
      tuple[0] = null;        // Catalog name
      tuple[1] = null;        // Schema name
      tuple[2] = r.getBytes(1);    // Table name
!     tuple[3] = relKind.getBytes();    // Table type
      tuple[4] = remarks;        // Remarks
      v.addElement(tuple);
        }
--- 1707,1713 ----
      tuple[0] = null;        // Catalog name
      tuple[1] = null;        // Schema name
      tuple[2] = r.getBytes(1);    // Table name
!     tuple[3] = (relKind==null) ? null : relKind.getBytes();    // Table type
      tuple[4] = remarks;        // Remarks
      v.addElement(tuple);
        }
Index: src/interfaces/jdbc/org/postgresql/jdbc2/DatabaseMetaData.java
===================================================================
RCS file: /home/projects/pgsql/cvsroot/pgsql/src/interfaces/jdbc/org/postgresql/jdbc2/DatabaseMetaData.java,v
retrieving revision 1.18
diff -c -r1.18 DatabaseMetaData.java
*** src/interfaces/jdbc/org/postgresql/jdbc2/DatabaseMetaData.java    2001/05/16 04:08:50    1.18
--- src/interfaces/jdbc/org/postgresql/jdbc2/DatabaseMetaData.java    2001/05/16 16:36:25
***************
*** 1697,1702 ****
--- 1697,1705 ----
      case 'S':
          relKind = "SEQUENCE";
          break;
+     case 'v':
+         relKind = "VIEW";
+         break;
      default:
          relKind = null;
      }
***************
*** 1704,1710 ****
      tuple[0] = null;        // Catalog name
      tuple[1] = null;        // Schema name
      tuple[2] = r.getBytes(1);    // Table name
!     tuple[3] = relKind.getBytes();    // Table type
      tuple[4] = remarks;        // Remarks
      v.addElement(tuple);
        }
--- 1707,1713 ----
      tuple[0] = null;        // Catalog name
      tuple[1] = null;        // Schema name
      tuple[2] = r.getBytes(1);    // Table name
!     tuple[3] = (relKind==null) ? null : relKind.getBytes();    // Table type
      tuple[4] = remarks;        // Remarks
      v.addElement(tuple);
        }

Latest binaries

From

"Dave Cramer"

Date:

17 May 2001, 07:53:13

For anyone looking for latest jar files; I have built the jars from the
latest code snapshot and they are available for download at

http://jdbc.fastcrypt.com

Dave