Re: Accents bug ? - Mailing list pgsql-jdbc

From Knut Forkalsrud
Subject Re: Accents bug ?
Date
Msg-id lzofnngfo0.fsf@darkstar.cj.com
Whole thread Raw
In response to Re: Accents bug ?  (Knut Forkalsrud <kforkalsrud@cj.com>)
Responses Re: Accents bug ?  (Denis Bucher <dbucher@niftycom.com>)
List pgsql-jdbc
This is a follow-up to my post a few days ago about the JDBC driver
chopping of strings at non-ASCII characters.  First a brief summary of
the problem:

1. I create a database specifying the encoding as UNICODE.

2. I invoke psql in a standard xterm (ISO-8859-1) and insert a few
   values.

3. I continue using psql and issue a SELECT to read the strings back.
   All appears well.

4. I try to do the same SELECT through JDBC and the strings are
   truncated at the non ASCII characters.

The issue seems to be related to another interface or the back end.  I
downloaded the JDBC source from anonymous cvs and compiled in some
debug code in the ResultSet.getString() method to display the actual
byte codes returned from the server.  Here is the method with my added
code prefixed by + characters at the beginning of the line:

  public String getString(int columnIndex) throws SQLException
  {
    if (columnIndex < 1 || columnIndex > fields.length)
      throw new PSQLException("postgresql.res.colrange");

    wasNullFlag = (this_row[columnIndex - 1] == null);
    if(wasNullFlag)
      return null;

+    final char[] hexDigits = { '0', '1', '2', '3', '4', '5', '6', '7',
+                               '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
+    int i;
+    byte[] rawText = this_row[columnIndex - 1];
+    System.out.print("-- Raw:");
+    for (i = 0; i < rawText.length; ++i) {
+        int ch = rawText[i] >= 0 ? rawText[i] : rawText[i] + 256;
+        System.out.print(" " + hexDigits[ch/16] + hexDigits[ch%16]);
+    }
+    System.out.println();

    Encoding encoding = connection.getEncoding();
    return encoding.decode(this_row[columnIndex - 1]);
  }


Before my actual query I saw two calls to getString():

        50 6f 73 74 67 72 65 53 51 4c 20 37 2e 31 2e 33 20 6f 6e 20 69
        36 38 36 2d 70 63 2d 6c 69 6e 75 78 2d 67 6e 75 2c 20 63 6f 6d
        70 69 6c 65 64 20 62 79 20 47 43 43 20 32 2e 39 36

which roughly translates to:

        PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96

and:

        55 4e 49 43 4f 44 45

which roughly translate to:

        UNICODE

I guess this is some initialization queries the driver does on
startup.  Then to my query.  I summarized the results in a table to
make it easier to follow.

Inserted     The JDBC driver byte buffer    JDBC string
-------------------------------------------------------
abcdefgh     61 62 63 64 65 66 67 68        abcdefgh
abæøå        61 62 e6 f8 e5                 ab
ab²          61 62 b2                       ab
ß÷¥£         df f7 a5 a3

The byte buffer seems to be in the ISO8859-1 character set and not
UTF8 as the UNICODE database encoding expects.  The error was probably
introduced during the INSERT.  I guess I should submit this as a bug
to the maintainers of the psql program.  But the "Bug report tool"
listed on the home page (http://www.ca.postgresql.org/bugs/index.php)
gives me a 404 Not found.  Well, I'll just have to try some other
servers.

-Knut

--
The early worm gets the bird.

pgsql-jdbc by date:

Previous
From: Barry Lind
Date:
Subject: Re: [HACKERS] Timestamp, fractional seconds problem
Next
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] Timestamp, fractional seconds problem