Thread: Re: [BUG] - Invalid UNICODE character sequence found(0xc000)
This is the ofending SELECT: SELECT A0.ROL_NAME,A0.ROL_ENABLE,A0.ROL_ID FROM AUTH_ROLE A0 WHERE (A0.ROL_NAME LIKE 'z%') AND A0.ROL_NAME <> 'admin' ORDER BY 1 I tried it using squirrelSQL - http://squirrel-sql.sourceforge.net/ and even in psql it returns: ERROR: Invalid UNICODE character sequence found (0xc000) in psql the "SELECT version();" returns: PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) Lets focus on: (A0.ROL_NAME LIKE 'z%') If I query: (A0.ROL_NAME LIKE 'za%') --- It is OK, but (A0.ROL_NAME LIKE 'az%') --- throw the same exception Looks like PostgreSQL incorrect manage this sequence: "z%" Can you test it in a table while doing a similar cosntruction? Can you check this? Best Regards, Antonio Gallardo. Csaba Nagy dijo: > Antonio, > > As Kris Jurka said in his posts, there's nothing special with the "z" characters, so your browser/os must be doing something wrong with the input. But the error you reported is a clear indication that the backend gets a byte sequence which is not properly encoded as UTF-8. If you pass only strings to the driver, then this is a driver error (means the driver encodes the string improperly). > To facilitate reproduction, I would suggest you to print out the unicode characters of your query string with something like: > > ... embed this in your program: > > for (int i = 0; i < queryString.length(); i++) { > System.out.print("\\u"); > System.out.print(toHexString(queryString.charAt(i))); > } > System.out.println(); > ... > private static final char[] hexChars = > { '0', '1', '2', '3', '4', '5', '6', > '7', '8', '9', 'A', 'B', 'C', 'D', > 'E', 'F' }; > > public static String toHexString(int n) > { > char[] buffer = new char[4]; > for (int i=0; i<4; i++) { > buffer[3-i] = hexChars[n & 0x0F]; > n >>= 4; > } > return new String(buffer); > } > > Then you can use the resulting string in the example program. This will make sure that the other person on the other end of the email will have exactly the same string as you - otherwise the you can bet that subtle encoding differences get lost as you type. > > Cheers, > Csaba. > > > On Fri, 2004-01-09 at 06:18, Antonio Gallardo wrote: >> Hi: >> >> First, here is the postgreSQL version used: >> PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC >> i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) >> >> I am aware similar problem was already sent to this list (I read some of them), but I want to contribute with more interesting stuff: >> >> In the tested web application, we use postgreSQL JDBC driver. We have a 1 >> field form where we allow the user to writte a search pattern a table. The >> generated SQL use LIKE to find for similars. Example: >> >> If the user write: "ant" then the answer will be: >> >> antonio >> antoine >> etc. >> >> This works fine, even if we left empty the form field, to show all the records. >> >> The interesting stuff I found is: >> >> If we write just "z", "Z" or any string with that include the chars "z" or >> "Z" at any point of the string in the field, then I got the below error. How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8. >> >> That means the driver has problems with an normal "z" or "Z"? >> >> Note: The same apply for the drivers: >> >> pg73jdbc.jar >> pg74jdbc.jar >> pg74.1jdbc.jar >> >> Please explain. >> >> Best Regards, >> >> Antonio Gallardo >> >> Caused by: java.sql.SQLException: ERROR: Invalid UNICODE character sequence found (0xc000) >> >> at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131) at >> org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505) at >> org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320) at >> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48) at >> org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:153) >> >> ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faqs/FAQ.html >
OK. Confirmed: Steps to reproduce: 1. createdb -E UNICODE mydbname. 2. create a table with some varchar inside, we will query on this field. Example: CREATE TABLE auth_role ( rol_id int4 not null default nextval('auth_rol_rol_id_seq'), rol_name varchar(50) unique not null, rol_enable boolean default true, primary key(rol_id) ); INSERT INTO auth_role(rol_name,rol_enable) VALUES ('admin',true); INSERT INTO auth_role(rol_name,rol_enable) VALUES ('zorro',true); 3. run psql and write a select like this: SELECT * FROM AUTH_ROLE WHERE ROL_NAME LIKE 'z%'; After all you was right, this is not a JDBC related bug! :-D Since we can also reproduce it using psql. I hoped it was my fault, but looks like a postgresql bug. :-( Please confirm the bug. Best Regards, Antonio Gallardo > in psql the "SELECT version();" returns: > > PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC > i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) >
As I previously suggested, post it to the general list. And add (again) the info about: - software you use; - OS, terminal encoding: might be relevant. Cheers, Csaba. On Fri, 2004-01-09 at 15:18, Antonio Gallardo wrote: > OK. Confirmed: > > Steps to reproduce: > > 1. createdb -E UNICODE mydbname. > 2. create a table with some varchar inside, we will query on this field. > Example: > > CREATE TABLE auth_role > ( > rol_id int4 not null default > nextval('auth_rol_rol_id_seq'), > rol_name varchar(50) unique not null, > rol_enable boolean default true, > > primary key(rol_id) > ); > > INSERT INTO auth_role(rol_name,rol_enable) VALUES ('admin',true); > INSERT INTO auth_role(rol_name,rol_enable) VALUES ('zorro',true); > > 3. run psql and write a select like this: > > SELECT * FROM AUTH_ROLE WHERE ROL_NAME LIKE 'z%'; > > After all you was right, this is not a JDBC related bug! :-D > > Since we can also reproduce it using psql. I hoped it was my fault, but > looks like a postgresql bug. :-( > > Please confirm the bug. > > Best Regards, > > Antonio Gallardo > > > in psql the "SELECT version();" returns: > > > > PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC > > i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Works fine on my sistem too, but I'm on a linux with UTF-8 encoding. Anyway, if it gives you the same error from psql, then it's not a JDBC bug, but rather a backend problem. I suggest you post the problem on the pgsql-general list, giving all the details of your installation, including the OS, the encoding of the terminal used, encoding of the DB, version of the software... I suspect this is only reproducable under your special circumstances, which I don't have here. On the general list there are more chances somebody can reproduce it. Cheers, Csaba. On Fri, 2004-01-09 at 15:05, Antonio Gallardo wrote: > This is the ofending SELECT: > > SELECT A0.ROL_NAME,A0.ROL_ENABLE,A0.ROL_ID FROM AUTH_ROLE A0 WHERE > (A0.ROL_NAME LIKE 'z%') AND A0.ROL_NAME <> 'admin' ORDER BY 1 > > I tried it using squirrelSQL - http://squirrel-sql.sourceforge.net/ and > even in psql it returns: > > ERROR: Invalid UNICODE character sequence found (0xc000) > > in psql the "SELECT version();" returns: > > PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC > i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) > > Lets focus on: > > (A0.ROL_NAME LIKE 'z%') > > If I query: > > (A0.ROL_NAME LIKE 'za%') --- It is OK, but > > (A0.ROL_NAME LIKE 'az%') --- throw the same exception > > Looks like PostgreSQL incorrect manage this sequence: "z%" > > Can you test it in a table while doing a similar cosntruction? > > Can you check this? > > Best Regards, > > Antonio Gallardo. >
Tom Lane dijo: > "Antonio Gallardo" <antonio@apache.org> writes: >> After all you was right, this is not a JDBC related bug! :-D > > Your test case works fine for me. Perhaps you could clue us in on what > client-side encoding you're using? I am using Linux Fedora Core 1. uname -r returns: 2.4.22-1.2140.nptl In psql \encoding reports me: UNICODE and SELECT version() returns: PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1) If you need more info, I will be glad to contribute. Best Regards, Antonio Gallardo.
"Antonio Gallardo" <antonio@apache.org> writes: > After all you was right, this is not a JDBC related bug! :-D Your test case works fine for me. Perhaps you could clue us in on what client-side encoding you're using? regards, tom lane
On 09/01/2004 14:18 Antonio Gallardo wrote: > [snip] > After all you was right, this is not a JDBC related bug! :-D > > Since we can also reproduce it using psql. I hoped it was my fault, but > looks like a postgresql bug. :-( > > Please confirm the bug. I can reproduce this on my Fedora machine. On an older system with RH7.2 and PG 7.3.4 compiled from source, I don't get the problem. Re-compiling from source on the Fedora machine still gives the error. HTH -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+
Hi all! A year ago I tried using postgresql for storing binary data, images, in a bytea column. I then got an error when trying to retrieve the binary data with JDBC. The length of the returned byte array was too long, and it was transformed, possibly a charset encoding. After some digging last year it seemed that the problem only occured for databases with ISO-8859-1 encoding. The original thread is at http://archives.postgresql.org/pgsql-jdbc/2003-03/msg00008.php Do you know if this error has been fixed? I seem to be getting the exact same error on psql 7.4. I could find a reference at http://developer.postgresql.org/todo.php stating "Prevent mismatch of frontend/backend encodings from converting bytea data from being interpreted as encoded strings". Can anyone confirm that the problem is known and is scheduled to be fixed? donV
Paul Thomas dijo: > On 09/01/2004 14:18 Antonio Gallardo wrote: >> [snip] >> After all you was right, this is not a JDBC related bug! :-D >> >> Since we can also reproduce it using psql. I hoped it was my fault, but >> looks like a postgresql bug. :-( >> >> Please confirm the bug. > > > I can reproduce this on my Fedora machine. On an older system with RH7.2 > and PG 7.3.4 compiled from source, I don't get the problem. Re-compiling > from source on the Fedora machine still gives the error. HTH Hi Paul: Thanks for the answer. Then it seems to be a problem in Fedora. If this is true, I guess the problem is inside the gcc libraries or a bad compile. I already filled a bug in Fedora: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113231 Best Regards, Antonio Gallardo.
"Antonio Gallardo" <antonio@apache.org> writes: > Paul Thomas dijo: >> I can reproduce this on my Fedora machine. On an older system with RH7.2 >> and PG 7.3.4 compiled from source, I don't get the problem. Re-compiling >> from source on the Fedora machine still gives the error. HTH > I already filled a bug in Fedora: > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113231 I was finally able to reproduce this with Paul's help. The bug turns out not to be Fedora-specific at all. I believe it will happen on any platform if you are using both a multibyte database encoding (such as Unicode) *and* a non-C locale. PG 7.4 has a more restricted form of the bug --- it's not locale specific but does still require a multibyte encoding. Here is a Unicode example: test=# select * from test where txtfield like '\277%'; -- ERROR: invalid byte sequence for encoding "UNICODE": 0xc0 I've committed fixes in both CVS branches. regards, tom lane
On 02/02/2004 03:26 Tom Lane wrote: > The bug turns out not to be Fedora-specific at all. I believe it will > happen on any platform if you are using both a multibyte database > encoding (such as Unicode) *and* a non-C locale. > > PG 7.4 has a more restricted form of the bug --- it's not locale > specific but does still require a multibyte encoding. Here is a Unicode > example: > > test=# select * from test where txtfield like '\277%'; -- > ERROR: invalid byte sequence for encoding "UNICODE": 0xc0 > > I've committed fixes in both CVS branches. Thanks Tom. I've recompiled on my Fedora box using 7.3 CVS tip and it's now a happy bunny :) -- Paul Thomas +------------------------------+---------------------------------------------+ | Thomas Micro Systems Limited | Software Solutions for the Smaller Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +------------------------------+---------------------------------------------+