Thread: DatabaseMetaData - getImportedKeys
Hello, I have the following problem working with DatabaseMetaData. There is a database with table and attribute names in Russian. Database cluster was initialized with appropriate ru_RU.KOI8-R locale. All the databases were created with KOI8-R encoding. No problems were encountered in accessing database table data with JDBC. Database has foreign key constraints that I try to get with DatabaseMetaData methods. Both getTables and getPrimaryKeys work fine, all the results have correct encoding and values. The following fragment of code causes exception: rs = meta.getImportedKeys(null,null,tableName); while(rs.next()) { String pkTable = rs1.getString("PKTABLE_NAME"); String pkColumn = rs1.getString("PKCOLUMN_NAME"); /* here */ String fkTable = rs1.getString("FKTABLE_NAME"); String fkColumn = rs1.getString("FKCOLUMN_NAME"); /* and here */ } PKTABLE_NAME and FKTABLE_NAME fields are fetched correctly. Both the marked lines produce exception with this stack trace: at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:270) at org.postgresql.core.Encoding.decode(Encoding.java:165) at org.postgresql.core.Encoding.decode(Encoding.java:181) at org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97) at org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:337) Error message is: "Invalid character data was found. This is most likely caused by stored data containing characters that are invalid for the character set the database was created in. The most common example of this is storing 8bit data in a SQL_ASCII database.", but database is not SQL_ASCII (actually KOI8-R) and all the characters in column names are taken from this codepage. Other DatabaseMetaData methods work with these characters fine. I tested the same methods with the same database but with tables with latin names - everything worked fine, but renaming all the columns will cause a huge amount of extra work with database and applications. I use PostgreSQL-7.3.4 compiled from source, JDBC driver from http://jdbc.postgresql.org/download/pg73jdbc3.jar on Linux, J2SDK 1.4.1_02. I will appreciate any help with this. Thank you. Sincerely yours, Aleksey.
On Mon, 3 Nov 2003, Aleksey wrote: > Hello, > > > > I have the following problem working with DatabaseMetaData. There is a > database with table and attribute names in Russian. Database cluster was > initialized with appropriate ru_RU.KOI8-R locale. All the databases were > created with KOI8-R encoding. No problems were encountered in accessing > database table data with JDBC. > > Database has foreign key constraints that I try to get with > DatabaseMetaData methods. Both getTables and getPrimaryKeys work fine, > all the results have correct encoding and values. > > The following fragment of code causes exception: > > rs = meta.getImportedKeys(null,null,tableName); > while(rs.next()) { > > String pkTable = rs1.getString("PKTABLE_NAME"); > String pkColumn = rs1.getString("PKCOLUMN_NAME"); /* here */ > > String fkTable = rs1.getString("FKTABLE_NAME"); > String fkColumn = rs1.getString("FKCOLUMN_NAME"); /* and here */ > > } > > PKTABLE_NAME and FKTABLE_NAME fields are fetched correctly. Both the > marked lines produce exception with this stack trace: > > at org.postgresql.core.Encoding.decodeUTF8(Encoding.java:270) > at org.postgresql.core.Encoding.decode(Encoding.java:165) > at org.postgresql.core.Encoding.decode(Encoding.java:181) > at > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97) > at > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:337) > > Error message is: "Invalid character data was found. This is most > likely caused by stored data containing characters that are invalid for > the character set the database was created in. The most common example > of this is storing 8bit data in a SQL_ASCII database.", > > but database is not SQL_ASCII (actually KOI8-R) and all the characters > in column names are taken from this codepage. Other DatabaseMetaData > methods work with these characters fine. > This is particularly odd because the DatabaseMetaData function has already parsed the data as valid unicode and then setup a "fake" in memory result set to work with which this is failing on. Could you send me a pg_dump file of something that will make this fail? Kris Jurka
On Mon, 3 Nov 2003, Aleksey wrote: > I have the following problem working with DatabaseMetaData. > > [ retreiving foreign key column names with KOI8-R characters fails > when trying to decodeUTF ] The way many DatabaseMetaData methods work is that they run a query to retrieve the necessary data which it then iterates over, reformats, and stores into an in memory ResultSet which is returned to the user. The in memory ResultSet is implemented with byte arrays, so all String data has .getBytes() called on it to turn it into a byte array. This turns it into a byte array with the JVM's default charset which may not be the UTF-8 we need. This is why the resulting decoding from UTF-8 is failing, because it is not actually UTF-8 data. The attached patch encodes the data into the format that the subsequent decoder expects. Aleksey, could you try out this patch or the pre-built jar file that includes it at http://www.ejurka.com/pgsql/ and confirm that this fixes your problem? Kris Jurka
Attachment
On Tue, 4 Nov 2003, Kris Jurka wrote: > > > On Mon, 3 Nov 2003, Aleksey wrote: > > > I have the following problem working with DatabaseMetaData. > > > > [ retreiving foreign key column names with KOI8-R characters fails > > when trying to decodeUTF ] > > The way many DatabaseMetaData methods work is that they run a query to > retrieve the necessary data which it then iterates over, reformats, and > stores into an in memory ResultSet which is returned to the user. The in > memory ResultSet is implemented with byte arrays, so all String data has > .getBytes() called on it to turn it into a byte array. This turns it into > a byte array with the JVM's default charset which may not be the UTF-8 we > need. This is why the resulting decoding from UTF-8 is failing, because > it is not actually UTF-8 data. > > The attached patch encodes the data into the format that the subsequent > decoder expects. Aleksey, could you try out this patch or the pre-built > jar file that includes it at http://www.ejurka.com/pgsql/ and confirm that > this fixes your problem? > > Kris Jurka > Attached is a corrected patch. The original failed to compile after doing a clean, but somehow I was able to build it earlier. Ant's dependency tracking could apparently use some work. Kris Jurka