Thread: getTables() doesn't handle umlauts correctly

getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
Hi,

I noticed that the driver will return incorrect values for table names that contain umlauts. Consider the following
code:

Class.forName("org.postgresql.Driver");
con = DriverManager.getConnection("jdbc:postgresql://localhost/postgres", "postgres", "password");
stmt = con.createStatement();
stmt.executeUpdate("create table public.umlaut_test_ö (id integer)");
rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null);
while (rs.next())
{
    System.out.println(rs.getString("TABLE_NAME"));
}

It will not display the table name correctly.

Now on the console this might be an encoding problem of the client, but the name is also not displayed correctly, when
e.g.using a Swing JLabel component which is fully UTF-8 compatible. 

Is there a connection parameter for the driver to return that correctly or is this a driver bug?

I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP

Regards
Thomas

Re: getTables() doesn't handle umlauts correctly

From
Lew
Date:
Thomas Kellerer wrote:
> I noticed that the driver will return incorrect values for table names
> that contain umlauts. Consider the following code:
>
> Class.forName("org.postgresql.Driver");

You only need to load the class once.

> con =
> DriverManager.getConnection("jdbc:postgresql://localhost/postgres",
> "postgres", "password");
> stmt = con.createStatement();
> stmt.executeUpdate("create table public.umlaut_test_ö (id integer)");
> rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null);
> while (rs.next())
> {
>    System.out.println(rs.getString("TABLE_NAME"));
> }
>
> It will not display the table name correctly.
>
> Now on the console this might be an encoding problem of the client, but
> the name is also not displayed correctly, when e.g. using a Swing JLabel
> component which is fully UTF-8 compatible.
>
> Is there a connection parameter for the driver to return that correctly
> or is this a driver bug?
>
> I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP

Sounds like your database character encoding doesn't match up.

--
Lew

Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
Lew, 22.11.2010 14:14:
>> stmt = con.createStatement();
>> stmt.executeUpdate("create table public.umlaut_test_ö (id integer)");
>> rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null);
>> while (rs.next())
>> {
>> System.out.println(rs.getString("TABLE_NAME"));
>> }
>>
>> It will not display the table name correctly.
>>
>> Now on the console this might be an encoding problem of the client, but
>> the name is also not displayed correctly, when e.g. using a Swing JLabel
>> component which is fully UTF-8 compatible.
>>
>> Is there a connection parameter for the driver to return that correctly
>> or is this a driver bug?
>>
>> I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP
>
> Sounds like your database character encoding doesn't match up.

select pg_encoding_to_char(encoding) from pg_database returns UTF8 (for all databases)

I verified that my Java code was compiled using UTF-8 as well, to be sure nothing was lost there.
The problem also shows up when using a JDBC based query tool.

Btw: the same happens with pgAdmin as well.
The above created table will be displayed with an "empty" Name in the pgAdmin tree


Regards
Thomas


Re: getTables() doesn't handle umlauts correctly

From
Maciek Sakrejda
Date:
Tried to replicate this here and it shows up correctly with the
umlaut. I don't think this is a jdbc problem (especially since you're
seeing this with pgAdmin too).
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
Maciek Sakrejda wrote on 22.11.2010 18:52:
> Tried to replicate this here and it shows up correctly with the
> umlaut. I don't think this is a jdbc problem (especially since you're
> seeing this with pgAdmin too).

Hmm, then this seems to be a Windows problem...

Which OS are you using?

Regards
Thomas

Re: getTables() doesn't handle umlauts correctly

From
Maciek Sakrejda
Date:
> Hmm, then this seems to be a Windows problem...
>
> Which OS are you using?

Ubuntu 10.10. I don't have a Windows box handy to try it there.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

Re: getTables() doesn't handle umlauts correctly

From
dmp
Date:
I also tried the table creation in the MyJSQLView app. which
uses the jdbc, this morning and I just don't see it. The correct
table name is showing up in the public schema and also in the
information_schema.tables as shown by the app.

I also tested the code directly and again the correct table
name is showing up. The only thing different is not on XP, but
with the same server and jdbc versions.

Now you may wish to try just % or null for the tableNamePattern.
Perhaps something is going on with parsing of that parameter.
The API indicates that "a table name pattern; must match the
table name as it is stored in the database"

danap

> Hi,
>
> I noticed that the driver will return incorrect values for table names that contain umlauts. Consider the following
code:
>
> Class.forName("org.postgresql.Driver");
> con = DriverManager.getConnection("jdbc:postgresql://localhost/postgres", "postgres", "password");
> stmt = con.createStatement();
> stmt.executeUpdate("create table public.umlaut_test_ö (id integer)");
> rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null);
> while (rs.next())
> {
>    System.out.println(rs.getString("TABLE_NAME"));
> }
>
> It will not display the table name correctly.
>
> Now on the console this might be an encoding problem of the client, but the name is also not displayed correctly,
whene.g. using a Swing JLabel component which is fully UTF-8 compatible. 
>
> Is there a connection parameter for the driver to return that correctly or is this a driver bug?
>
> I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP
>
> Regards
> Thomas


Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
dmp wrote on 22.11.2010 19:54:
> I also tried the table creation in the MyJSQLView app. which
> uses the jdbc, this morning and I just don't see it. The correct
> table name is showing up in the public schema and also in the information_schema.tables as shown by the app.
>
> I also tested the code directly and again the correct table
> name is showing up. The only thing different is not on XP, but
> with the same server and jdbc versions.

So this seems indeed to be a Windows issue.

> Now you may wish to try just % or null for the tableNamePattern.
> Perhaps something is going on with parsing of that parameter.
> The API indicates that "a table name pattern; must match the
> table name as it is stored in the database

getTables() does return one table, but the umlaut is garbled.
Sorry for not being clear on this.

Regards
Thomas





Re: getTables() doesn't handle umlauts correctly

From
Samuel Gendler
Date:


On Mon, Nov 22, 2010 at 10:59 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
dmp wrote on 22.11.2010 19:54:

I also tried the table creation in the MyJSQLView app. which
uses the jdbc, this morning and I just don't see it. The correct
table name is showing up in the public schema and also in the information_schema.tables as shown by the app.

I also tested the code directly and again the correct table
name is showing up. The only thing different is not on XP, but
with the same server and jdbc versions.

So this seems indeed to be a Windows issue.

 
Now you may wish to try just % or null for the tableNamePattern.
Perhaps something is going on with parsing of that parameter.
The API indicates that "a table name pattern; must match the
table name as it is stored in the database

getTables() does return one table, but the umlaut is garbled.
Sorry for not being clear on this.


What is the output of Charset.defaultCharset() in those java processes? Are you forcing things to UTF-8? I've had problems in the past with JVMs coming up with surprising default charsets, forcing me to override the OS or specify the defult charset with a system property to get things to work correctly.

 

Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
Samuel Gendler wrote on 22.11.2010 22:34:
> getTables() does return one table, but the umlaut is garbled. Sorry
> for not being clear on this.
>
>
> What is the output of Charset.defaultCharset() in those java
> processes?

UTF-8

> Are you forcing things to UTF-8? I've had problems in the
> past with JVMs coming up with surprising default charsets, forcing me
> to override the OS or specify the defult charset with a system
> property to get things to work correctly.

I don't have any problems with UTF-8 characters in my own tables, so I don't think that would be the problem.

Regards
Thomas




Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
> getTables() does return one table, but the umlaut is garbled. Sorry
> for not being clear on this.
>
>
> What is the output of Charset.defaultCharset() in those java
> processes? Are you forcing things to UTF-8? I've had problems in the
> past with JVMs coming up with surprising default charsets, forcing me
> to override the OS or specify the defult charset with a system
> property to get things to work correctly.

What I find irritating is that I can run SELECT or INSERT statements against that table without problems.

So the transmission of SQL Statements (through the Statement object) seems to be working with a different encoding...

Regards
Thomas






Re: getTables() doesn't handle umlauts correctly

From
Kris Jurka
Date:

On Mon, 22 Nov 2010, Thomas Kellerer wrote:

> What I find irritating is that I can run SELECT or INSERT statements against
> that table without problems.
>
> So the transmission of SQL Statements (through the Statement object) seems to
> be working with a different encoding...

As the discussion has shown, trying to determine who is at fault here is
not trivial.  The best way to show that postgresql (driver or server if
you're seeing it in pgadmin too) is at fault is to create a test case
creating the table and then querying the metadata. It would be helpful to
use either a Java or PG escape code for the special character so it
doesn't get mangled by either mail clients or build environments. Then use
String.codePointAt to print out the actual data for both the table name
used for construction and returned by the metadata.  That would
conclusively show that PG is at fault somewhere.

Something like this untested code:

void printString(String str) {
     for (int i=0; i<str.length(); ) {
         int cp = str.codePointAt(i);
         System.out.print(" " + cp);
         i += Character.charCount(cp);
     }
     System.out.println();
}

Kris Jurka

Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
Kris Jurka, 23.11.2010 09:13:
> As the discussion has shown, trying to determine who is at fault here
> is not trivial. The best way to show that postgresql (driver or
> server if you're seeing it in pgadmin too) is at fault is to create a
> test case creating the table and then querying the metadata. It would
> be helpful to use either a Java or PG escape code for the special
> character so it doesn't get mangled by either mail clients or build
> environments. Then use String.codePointAt to print out the actual
> data for both the table name used for construction and returned by
> the metadata. That would conclusively show that PG is at fault
> somewhere.

OK, this is my test program:

Connection con = DriverManager.getConnection("jdbc:postgresql://localhost:5432/postgres", "postgres", "postgres");
Statement stmt = con.createStatement();

stmt.executeUpdate("create table umlaut_ö (some_data varchar(10))");
stmt.executeUpdate("insert into umlaut_ö (some_data) values ('öäü')");

ResultSet rs = con.getMetaData().getTables(null, "public", "umlaut%", null);
if (rs.next()) {
   String name = rs.getString("TABLE_NAME");
   System.out.println("table name: " + name);
   System.out.print("  codepoints:");
   for (int i = 0; i < name.length();)
   {
     int cp = name.codePointAt(i);
     System.out.print(" " + cp);
     i += Character.charCount(cp);
   }
   System.out.println("");
}
rs.close();

rs = stmt.executeQuery("select count(*) from umlaut_ö where some_data = 'öäü'");
if (rs.next()) {
   int count = rs.getInt(1);
   System.out.println("number of rows: " + count);
}
rs.close();

rs = stmt.executeQuery("select some_data from umlaut_ö");
if (rs.next()) {
   String data = rs.getString(1);
   System.out.println("data: " + data);
   System.out.print("  codepoints:");
   for (int i = 0; i < data.length();)
   {
     int cp = data.codePointAt(i);
     System.out.print(" " + cp);
     i += Character.charCount(cp);
   }
   System.out.println("");
}
rs.close();

stmt.executeUpdate("drop table umlaut_ö");

stmt.close();
con.close();


The output on my computer is:

table name: umlaut_test_�
   codepoints: 117 109 108 97 117 116 95 116 101 115 116 95 65533
number of rows: 1
data: öäü
   codepoints: 246 228 252

So it seems that the umlauts in the table name are returned with a different encoding than the data itself.

Nevertheless the umlauts when being *sent* to the server are always treated correctly (as part of a table name as well
ascolumn values) 

This is with 9.0.1 on Windows XP using postgresql-9.0-801.jdbc4.jar

Regards
Thomas

Re: getTables() doesn't handle umlauts correctly

From
Kris Jurka
Date:
On 11/23/2010 6:22 AM, Thomas Kellerer wrote:
> table name: umlaut_test_�
>    codepoints: 117 109 108 97 117 116 95 116 101 115 116 95 65533
> number of rows: 1
> data: öäü
>    codepoints: 246 228 252
>

I can reproduce this, but only with databases that have a mismatched
encoding and ctype/collation.  That's supposed to be tough to mess up
now, so I was only able to do it on older server versions.  What do you
have for encoding, lc_collation, and lc_ctype?  \l output from psql
would be useful.

Kris Jurka

Re: getTables() doesn't handle umlauts correctly

From
Thomas Kellerer
Date:
Kris Jurka wrote on 24.11.2010 08:43:
> On 11/23/2010 6:22 AM, Thomas Kellerer wrote:
>> table name: umlaut_test_� codepoints: 117 109 108 97 117 116 95 116
>> 101 115 116 95 65533 number of rows: 1 data: öäü codepoints: 246
>> 228 252
>>
>
> I can reproduce this, but only with databases that have a mismatched
> encoding and ctype/collation. That's supposed to be tough to mess up
> now, so I was only able to do it on older server versions. What do
> you have for encoding, lc_collation, and lc_ctype?

encoding: UTF-8
lc_collation: German_Germany.1252
lc_ctype: German_Germany.1252


>\l output from psql would be useful.

                                            List of databases
     Name     |  Owner   | Encoding |      Collation      |        Ctype        |   Access privileges
-------------+----------+----------+---------------------+---------------------+-----------------------
  thomas      | thomas   | UTF8     | German_Germany.1252 | German_Germany.1252 |


For completeness:

                            version
-------------------------------------------------------------
  PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 32-bit
(1 row)

JDBC driver: PostgreSQL 9.0 JDBC4 (build 801)


Regards
Thomas