Thread: [BUG] - Invalid UNICODE character sequence found (0xc000)
Hi: First, here is the postgreSQL version used: PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) I am aware similar problem was already sent to this list (I read some of them), but I want to contribute with more interesting stuff: In the tested web application, we use postgreSQL JDBC driver. We have a 1 field form where we allow the user to writte a search pattern a table. The generated SQL use LIKE to find for similars. Example: If the user write: "ant" then the answer will be: antonio antoine etc. This works fine, even if we left empty the form field, to show all the records. The interesting stuff I found is: If we write just "z", "Z" or any string with that include the chars "z" or "Z" at any point of the string in the field, then I got the below error. How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8. That means the driver has problems with an normal "z" or "Z"? Note: The same apply for the drivers: pg73jdbc.jar pg74jdbc.jar pg74.1jdbc.jar Please explain. Best Regards, Antonio Gallardo Caused by: java.sql.SQLException: ERROR: Invalid UNICODE character sequence found (0xc000) at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131) at org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505) at org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48) at org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:153)
On Thu, 8 Jan 2004, Antonio Gallardo wrote: > Hi: > > In the tested web application, we use postgreSQL JDBC driver. We have a 1 > field form where we allow the user to writte a search pattern a table. The > generated SQL use LIKE to find for similars. Example: > > This works fine, even if we left empty the form field, to show all the > records. > > The interesting stuff I found is: > > If we write just "z", "Z" or any string with that include the chars "z" or > "Z" at any point of the string in the field, then I got the below error. > How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but > for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8. > > That means the driver has problems with an normal "z" or "Z"? > > Note: The same apply for the drivers: > Could you possibly write a standalone Java program that demonstrates this error? What encoding is your database? What encoding is you web application running in? What encoding is used in the browser? Kris Jurka
Kris Jurka dijo: > > > On Thu, 8 Jan 2004, Antonio Gallardo wrote: > >> Hi: >> >> In the tested web application, we use postgreSQL JDBC driver. We have a >> 1 >> field form where we allow the user to writte a search pattern a table. >> The >> generated SQL use LIKE to find for similars. Example: >> >> This works fine, even if we left empty the form field, to show all the >> records. >> >> The interesting stuff I found is: >> >> If we write just "z", "Z" or any string with that include the chars "z" >> or >> "Z" at any point of the string in the field, then I got the below error. >> How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but >> for me "z" or "Z" is part of the ASCII that means a 1 byte code in >> UTF-8. >> >> That means the driver has problems with an normal "z" or "Z"? >> >> Note: The same apply for the drivers: >> > > Could you possibly write a standalone Java program that demonstrates this > error? What encoding is your database? builded using: createdb -E UNICODE mydbname AFAIK this means it is UTF-8 > What encoding is you web application running in? All using UTF-8 it is using Cocoon - http://cocoon.apache.org/ > What encoding is used in the browser? Browser: Mozilla Encoding: ISO-8859-1 I know you can point to the browser as the source of the problem, but I can insert any char to the database. I can also retrieve all the rows that include any char from the database if I don't query using the "z" char. It does not matter if the retrieved values include "z". The problem is clearly related to the SELECT statement using "z". Please note I am able to INSERT and UPDATE using the web app interface. I can also search using other chars as: á é ý ú etc. Any other char is OK. Why the "z" (that is part of the ASCII => equal code in UTF-8) is offending? This is why I decided to write to the postgreSQL list. Best Regards, Antonio Gallardo
On Fri, 9 Jan 2004, Antonio Gallardo wrote: > The problem is clearly related to the SELECT statement using "z". > You haven't really described what you are doing, but I refuse to believe that the letter z is special in any relevent way. The attached sample program shows that the letter z works fine. If this fails for you I am interested otherwise you'll have to provide a similar program that fails for me. Kris Jurka
Attachment
Antonio, As Kris Jurka said in his posts, there's nothing special with the "z" characters, so your browser/os must be doing something wrong with the input. But the error you reported is a clear indication that the backend gets a byte sequence which is not properly encoded as UTF-8. If you pass only strings to the driver, then this is a driver error (means the driver encodes the string improperly). To facilitate reproduction, I would suggest you to print out the unicode characters of your query string with something like: ... embed this in your program: for (int i = 0; i < queryString.length(); i++) { System.out.print("\\u"); System.out.print(toHexString(queryString.charAt(i))); } System.out.println(); ... private static final char[] hexChars = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' }; public static String toHexString(int n) { char[] buffer = new char[4]; for (int i=0; i<4; i++) { buffer[3-i] = hexChars[n & 0x0F]; n >>= 4; } return new String(buffer); } Then you can use the resulting string in the example program. This will make sure that the other person on the other end of the email will have exactly the same string as you - otherwise the you can bet that subtle encoding differences get lost as you type. Cheers, Csaba. On Fri, 2004-01-09 at 06:18, Antonio Gallardo wrote: > Hi: > > First, here is the postgreSQL version used: > PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC > i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) > > I am aware similar problem was already sent to this list (I read some of > them), but I want to contribute with more interesting stuff: > > In the tested web application, we use postgreSQL JDBC driver. We have a 1 > field form where we allow the user to writte a search pattern a table. The > generated SQL use LIKE to find for similars. Example: > > If the user write: "ant" then the answer will be: > > antonio > antoine > etc. > > This works fine, even if we left empty the form field, to show all the > records. > > The interesting stuff I found is: > > If we write just "z", "Z" or any string with that include the chars "z" or > "Z" at any point of the string in the field, then I got the below error. > How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but > for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8. > > That means the driver has problems with an normal "z" or "Z"? > > Note: The same apply for the drivers: > > pg73jdbc.jar > pg74jdbc.jar > pg74.1jdbc.jar > > Please explain. > > Best Regards, > > Antonio Gallardo > > Caused by: java.sql.SQLException: ERROR: Invalid UNICODE character > sequence found (0xc000) > > at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131) > at > org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505) > at > org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320) > at > org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48) > at > org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:153) > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html