Thread: getTables() doesn't handle umlauts correctly
Hi, I noticed that the driver will return incorrect values for table names that contain umlauts. Consider the following code: Class.forName("org.postgresql.Driver"); con = DriverManager.getConnection("jdbc:postgresql://localhost/postgres", "postgres", "password"); stmt = con.createStatement(); stmt.executeUpdate("create table public.umlaut_test_ö (id integer)"); rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null); while (rs.next()) { System.out.println(rs.getString("TABLE_NAME")); } It will not display the table name correctly. Now on the console this might be an encoding problem of the client, but the name is also not displayed correctly, when e.g.using a Swing JLabel component which is fully UTF-8 compatible. Is there a connection parameter for the driver to return that correctly or is this a driver bug? I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP Regards Thomas
Thomas Kellerer wrote: > I noticed that the driver will return incorrect values for table names > that contain umlauts. Consider the following code: > > Class.forName("org.postgresql.Driver"); You only need to load the class once. > con = > DriverManager.getConnection("jdbc:postgresql://localhost/postgres", > "postgres", "password"); > stmt = con.createStatement(); > stmt.executeUpdate("create table public.umlaut_test_ö (id integer)"); > rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null); > while (rs.next()) > { > System.out.println(rs.getString("TABLE_NAME")); > } > > It will not display the table name correctly. > > Now on the console this might be an encoding problem of the client, but > the name is also not displayed correctly, when e.g. using a Swing JLabel > component which is fully UTF-8 compatible. > > Is there a connection parameter for the driver to return that correctly > or is this a driver bug? > > I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP Sounds like your database character encoding doesn't match up. -- Lew
Lew, 22.11.2010 14:14: >> stmt = con.createStatement(); >> stmt.executeUpdate("create table public.umlaut_test_ö (id integer)"); >> rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null); >> while (rs.next()) >> { >> System.out.println(rs.getString("TABLE_NAME")); >> } >> >> It will not display the table name correctly. >> >> Now on the console this might be an encoding problem of the client, but >> the name is also not displayed correctly, when e.g. using a Swing JLabel >> component which is fully UTF-8 compatible. >> >> Is there a connection parameter for the driver to return that correctly >> or is this a driver bug? >> >> I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP > > Sounds like your database character encoding doesn't match up. select pg_encoding_to_char(encoding) from pg_database returns UTF8 (for all databases) I verified that my Java code was compiled using UTF-8 as well, to be sure nothing was lost there. The problem also shows up when using a JDBC based query tool. Btw: the same happens with pgAdmin as well. The above created table will be displayed with an "empty" Name in the pgAdmin tree Regards Thomas
Tried to replicate this here and it shows up correctly with the umlaut. I don't think this is a jdbc problem (especially since you're seeing this with pgAdmin too). --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
Maciek Sakrejda wrote on 22.11.2010 18:52: > Tried to replicate this here and it shows up correctly with the > umlaut. I don't think this is a jdbc problem (especially since you're > seeing this with pgAdmin too). Hmm, then this seems to be a Windows problem... Which OS are you using? Regards Thomas
> Hmm, then this seems to be a Windows problem... > > Which OS are you using? Ubuntu 10.10. I don't have a Windows box handy to try it there. --- Maciek Sakrejda | System Architect | Truviso 1065 E. Hillsdale Blvd., Suite 215 Foster City, CA 94404 (650) 242-3500 Main www.truviso.com
I also tried the table creation in the MyJSQLView app. which uses the jdbc, this morning and I just don't see it. The correct table name is showing up in the public schema and also in the information_schema.tables as shown by the app. I also tested the code directly and again the correct table name is showing up. The only thing different is not on XP, but with the same server and jdbc versions. Now you may wish to try just % or null for the tableNamePattern. Perhaps something is going on with parsing of that parameter. The API indicates that "a table name pattern; must match the table name as it is stored in the database" danap > Hi, > > I noticed that the driver will return incorrect values for table names that contain umlauts. Consider the following code: > > Class.forName("org.postgresql.Driver"); > con = DriverManager.getConnection("jdbc:postgresql://localhost/postgres", "postgres", "password"); > stmt = con.createStatement(); > stmt.executeUpdate("create table public.umlaut_test_ö (id integer)"); > rs = con.getMetaData().getTables(null, "public", "umlaut_test%", null); > while (rs.next()) > { > System.out.println(rs.getString("TABLE_NAME")); > } > > It will not display the table name correctly. > > Now on the console this might be an encoding problem of the client, but the name is also not displayed correctly, whene.g. using a Swing JLabel component which is fully UTF-8 compatible. > > Is there a connection parameter for the driver to return that correctly or is this a driver bug? > > I'm using the 9.0-801 driver with a 9.0.1 database running on Windows XP > > Regards > Thomas
dmp wrote on 22.11.2010 19:54: > I also tried the table creation in the MyJSQLView app. which > uses the jdbc, this morning and I just don't see it. The correct > table name is showing up in the public schema and also in the information_schema.tables as shown by the app. > > I also tested the code directly and again the correct table > name is showing up. The only thing different is not on XP, but > with the same server and jdbc versions. So this seems indeed to be a Windows issue. > Now you may wish to try just % or null for the tableNamePattern. > Perhaps something is going on with parsing of that parameter. > The API indicates that "a table name pattern; must match the > table name as it is stored in the database getTables() does return one table, but the umlaut is garbled. Sorry for not being clear on this. Regards Thomas
On Mon, Nov 22, 2010 at 10:59 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
dmp wrote on 22.11.2010 19:54:So this seems indeed to be a Windows issue.I also tried the table creation in the MyJSQLView app. which
uses the jdbc, this morning and I just don't see it. The correct
table name is showing up in the public schema and also in the information_schema.tables as shown by the app.
I also tested the code directly and again the correct table
name is showing up. The only thing different is not on XP, but
with the same server and jdbc versions.getTables() does return one table, but the umlaut is garbled.
Now you may wish to try just % or null for the tableNamePattern.
Perhaps something is going on with parsing of that parameter.
The API indicates that "a table name pattern; must match the
table name as it is stored in the database
Sorry for not being clear on this.
What is the output of Charset.defaultCharset() in those java processes? Are you forcing things to UTF-8? I've had problems in the past with JVMs coming up with surprising default charsets, forcing me to override the OS or specify the defult charset with a system property to get things to work correctly.
Samuel Gendler wrote on 22.11.2010 22:34: > getTables() does return one table, but the umlaut is garbled. Sorry > for not being clear on this. > > > What is the output of Charset.defaultCharset() in those java > processes? UTF-8 > Are you forcing things to UTF-8? I've had problems in the > past with JVMs coming up with surprising default charsets, forcing me > to override the OS or specify the defult charset with a system > property to get things to work correctly. I don't have any problems with UTF-8 characters in my own tables, so I don't think that would be the problem. Regards Thomas
> getTables() does return one table, but the umlaut is garbled. Sorry > for not being clear on this. > > > What is the output of Charset.defaultCharset() in those java > processes? Are you forcing things to UTF-8? I've had problems in the > past with JVMs coming up with surprising default charsets, forcing me > to override the OS or specify the defult charset with a system > property to get things to work correctly. What I find irritating is that I can run SELECT or INSERT statements against that table without problems. So the transmission of SQL Statements (through the Statement object) seems to be working with a different encoding... Regards Thomas
On Mon, 22 Nov 2010, Thomas Kellerer wrote: > What I find irritating is that I can run SELECT or INSERT statements against > that table without problems. > > So the transmission of SQL Statements (through the Statement object) seems to > be working with a different encoding... As the discussion has shown, trying to determine who is at fault here is not trivial. The best way to show that postgresql (driver or server if you're seeing it in pgadmin too) is at fault is to create a test case creating the table and then querying the metadata. It would be helpful to use either a Java or PG escape code for the special character so it doesn't get mangled by either mail clients or build environments. Then use String.codePointAt to print out the actual data for both the table name used for construction and returned by the metadata. That would conclusively show that PG is at fault somewhere. Something like this untested code: void printString(String str) { for (int i=0; i<str.length(); ) { int cp = str.codePointAt(i); System.out.print(" " + cp); i += Character.charCount(cp); } System.out.println(); } Kris Jurka
Kris Jurka, 23.11.2010 09:13: > As the discussion has shown, trying to determine who is at fault here > is not trivial. The best way to show that postgresql (driver or > server if you're seeing it in pgadmin too) is at fault is to create a > test case creating the table and then querying the metadata. It would > be helpful to use either a Java or PG escape code for the special > character so it doesn't get mangled by either mail clients or build > environments. Then use String.codePointAt to print out the actual > data for both the table name used for construction and returned by > the metadata. That would conclusively show that PG is at fault > somewhere. OK, this is my test program: Connection con = DriverManager.getConnection("jdbc:postgresql://localhost:5432/postgres", "postgres", "postgres"); Statement stmt = con.createStatement(); stmt.executeUpdate("create table umlaut_ö (some_data varchar(10))"); stmt.executeUpdate("insert into umlaut_ö (some_data) values ('öäü')"); ResultSet rs = con.getMetaData().getTables(null, "public", "umlaut%", null); if (rs.next()) { String name = rs.getString("TABLE_NAME"); System.out.println("table name: " + name); System.out.print(" codepoints:"); for (int i = 0; i < name.length();) { int cp = name.codePointAt(i); System.out.print(" " + cp); i += Character.charCount(cp); } System.out.println(""); } rs.close(); rs = stmt.executeQuery("select count(*) from umlaut_ö where some_data = 'öäü'"); if (rs.next()) { int count = rs.getInt(1); System.out.println("number of rows: " + count); } rs.close(); rs = stmt.executeQuery("select some_data from umlaut_ö"); if (rs.next()) { String data = rs.getString(1); System.out.println("data: " + data); System.out.print(" codepoints:"); for (int i = 0; i < data.length();) { int cp = data.codePointAt(i); System.out.print(" " + cp); i += Character.charCount(cp); } System.out.println(""); } rs.close(); stmt.executeUpdate("drop table umlaut_ö"); stmt.close(); con.close(); The output on my computer is: table name: umlaut_test_� codepoints: 117 109 108 97 117 116 95 116 101 115 116 95 65533 number of rows: 1 data: öäü codepoints: 246 228 252 So it seems that the umlauts in the table name are returned with a different encoding than the data itself. Nevertheless the umlauts when being *sent* to the server are always treated correctly (as part of a table name as well ascolumn values) This is with 9.0.1 on Windows XP using postgresql-9.0-801.jdbc4.jar Regards Thomas
On 11/23/2010 6:22 AM, Thomas Kellerer wrote: > table name: umlaut_test_� > codepoints: 117 109 108 97 117 116 95 116 101 115 116 95 65533 > number of rows: 1 > data: öäü > codepoints: 246 228 252 > I can reproduce this, but only with databases that have a mismatched encoding and ctype/collation. That's supposed to be tough to mess up now, so I was only able to do it on older server versions. What do you have for encoding, lc_collation, and lc_ctype? \l output from psql would be useful. Kris Jurka
Kris Jurka wrote on 24.11.2010 08:43: > On 11/23/2010 6:22 AM, Thomas Kellerer wrote: >> table name: umlaut_test_� codepoints: 117 109 108 97 117 116 95 116 >> 101 115 116 95 65533 number of rows: 1 data: öäü codepoints: 246 >> 228 252 >> > > I can reproduce this, but only with databases that have a mismatched > encoding and ctype/collation. That's supposed to be tough to mess up > now, so I was only able to do it on older server versions. What do > you have for encoding, lc_collation, and lc_ctype? encoding: UTF-8 lc_collation: German_Germany.1252 lc_ctype: German_Germany.1252 >\l output from psql would be useful. List of databases Name | Owner | Encoding | Collation | Ctype | Access privileges -------------+----------+----------+---------------------+---------------------+----------------------- thomas | thomas | UTF8 | German_Germany.1252 | German_Germany.1252 | For completeness: version ------------------------------------------------------------- PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 32-bit (1 row) JDBC driver: PostgreSQL 9.0 JDBC4 (build 801) Regards Thomas