Thread: [BUG] - Invalid UNICODE character sequence found (0xc000)

[BUG] - Invalid UNICODE character sequence found (0xc000)

From
"Antonio Gallardo"
Date:
Hi:

First, here is the postgreSQL version used:
PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

I am aware similar problem was already sent to this list (I read some of
them), but I want to contribute with more interesting stuff:

In the tested web application, we use postgreSQL JDBC driver. We have a 1
field form where we allow the user to writte a search pattern a table. The
generated SQL use LIKE to find for similars. Example:

If the user write: "ant" then the answer will be:

antonio
antoine
etc.

This works fine, even if we left empty the form field, to show all the
records.

The interesting stuff I found is:

If we write just "z", "Z" or any string with that include the chars "z" or
"Z" at any point of the string in the field, then I got the below error.
How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but
for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8.

That means the driver has problems with an normal "z" or "Z"?

Note: The same apply for the drivers:

pg73jdbc.jar
pg74jdbc.jar
pg74.1jdbc.jar

Please explain.

Best Regards,

Antonio Gallardo

Caused by: java.sql.SQLException: ERROR:  Invalid UNICODE character
sequence found (0xc000)

at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131)
at
org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505)
at
org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320)
at
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48)
at
org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:153)

Re: [BUG] - Invalid UNICODE character sequence found (0xc000)

From
Kris Jurka
Date:

On Thu, 8 Jan 2004, Antonio Gallardo wrote:

> Hi:
>
> In the tested web application, we use postgreSQL JDBC driver. We have a 1
> field form where we allow the user to writte a search pattern a table. The
> generated SQL use LIKE to find for similars. Example:
>
> This works fine, even if we left empty the form field, to show all the
> records.
>
> The interesting stuff I found is:
>
> If we write just "z", "Z" or any string with that include the chars "z" or
> "Z" at any point of the string in the field, then I got the below error.
> How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but
> for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8.
>
> That means the driver has problems with an normal "z" or "Z"?
>
> Note: The same apply for the drivers:
>

Could you possibly write a standalone Java program that demonstrates this
error?  What encoding is your database?  What encoding is you web
application running in?  What encoding is used in the browser?

Kris Jurka


Re: [BUG] - Invalid UNICODE character sequence found

From
"Antonio Gallardo"
Date:
Kris Jurka dijo:
>
>
> On Thu, 8 Jan 2004, Antonio Gallardo wrote:
>
>> Hi:
>>
>> In the tested web application, we use postgreSQL JDBC driver. We have a
>> 1
>> field form where we allow the user to writte a search pattern a table.
>> The
>> generated SQL use LIKE to find for similars. Example:
>>
>> This works fine, even if we left empty the form field, to show all the
>> records.
>>
>> The interesting stuff I found is:
>>
>> If we write just "z", "Z" or any string with that include the chars "z"
>> or
>> "Z" at any point of the string in the field, then I got the below error.
>> How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but
>> for me "z" or "Z" is part of the ASCII that means a 1 byte code in
>> UTF-8.
>>
>> That means the driver has problems with an normal "z" or "Z"?
>>
>> Note: The same apply for the drivers:
>>
>
> Could you possibly write a standalone Java program that demonstrates this
> error?

What encoding is your database?

builded using:

createdb -E UNICODE mydbname
AFAIK this means it is UTF-8

> What encoding is you web application running in?

All using UTF-8 it is using Cocoon - http://cocoon.apache.org/

> What encoding is used in the browser?

Browser: Mozilla Encoding: ISO-8859-1

I know you can point to the browser as the source of the problem, but I
can insert any char to the database. I can also retrieve all the rows that
include any char from the database if I don't query using the "z" char. It
does not matter if the retrieved values include "z".

The problem is clearly related to the SELECT statement using "z".

Please note I am able to INSERT and UPDATE using the web app interface. I
can also search using other chars as: á é ý ú € etc. Any other char is OK.
Why the "z" (that is part of the ASCII => equal code in UTF-8) is
offending?

This is why I decided to write to the postgreSQL list.

Best Regards,

Antonio Gallardo



Re: [BUG] - Invalid UNICODE character sequence found

From
Kris Jurka
Date:

On Fri, 9 Jan 2004, Antonio Gallardo wrote:

> The problem is clearly related to the SELECT statement using "z".
>

You haven't really described what you are doing, but I refuse to believe
that the letter z is special in any relevent way.  The attached sample
program shows that the letter z works fine.   If this fails for you I am
interested otherwise you'll have to provide a similar program that fails
for me.

Kris Jurka

Attachment

Re: [BUG] - Invalid UNICODE character sequence found

From
Csaba Nagy
Date:
Antonio,

As Kris Jurka said in his posts, there's nothing special with the "z"
characters, so your browser/os must be doing something wrong with the
input. But the error you reported is a clear indication that the backend
gets a byte sequence which is not properly encoded as UTF-8. If you pass
only strings to the driver, then this is a driver error (means the
driver encodes the string improperly).
To facilitate reproduction, I would suggest you to print out the unicode
characters of your query string with something like:

... embed this in your program:

    for (int i = 0; i < queryString.length(); i++) {
        System.out.print("\\u");
        System.out.print(toHexString(queryString.charAt(i)));
    }
    System.out.println();
...
                        
private static final char[] hexChars =
    { '0', '1', '2', '3', '4', '5', '6',
      '7', '8', '9', 'A', 'B', 'C', 'D',
      'E', 'F' };

                     
public static String toHexString(int n)
{
    char[] buffer = new char[4];
    for (int i=0; i<4; i++) {
        buffer[3-i] = hexChars[n & 0x0F];
        n >>= 4;
    }
    return new String(buffer);
}

                     
Then you can use the resulting string in the example program. This will
make sure that the other person on the other end of the email will have
exactly the same string as you - otherwise the you can bet that subtle
encoding differences get lost as you type.

Cheers,
Csaba.


On Fri, 2004-01-09 at 06:18, Antonio Gallardo wrote:
> Hi:
>
> First, here is the postgreSQL version used:
> PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
> i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
>
> I am aware similar problem was already sent to this list (I read some of
> them), but I want to contribute with more interesting stuff:
>
> In the tested web application, we use postgreSQL JDBC driver. We have a 1
> field form where we allow the user to writte a search pattern a table. The
> generated SQL use LIKE to find for similars. Example:
>
> If the user write: "ant" then the answer will be:
>
> antonio
> antoine
> etc.
>
> This works fine, even if we left empty the form field, to show all the
> records.
>
> The interesting stuff I found is:
>
> If we write just "z", "Z" or any string with that include the chars "z" or
> "Z" at any point of the string in the field, then I got the below error.
> How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII expert, but
> for me "z" or "Z" is part of the ASCII that means a 1 byte code in UTF-8.
>
> That means the driver has problems with an normal "z" or "Z"?
>
> Note: The same apply for the drivers:
>
> pg73jdbc.jar
> pg74jdbc.jar
> pg74.1jdbc.jar
>
> Please explain.
>
> Best Regards,
>
> Antonio Gallardo
>
> Caused by: java.sql.SQLException: ERROR:  Invalid UNICODE character
> sequence found (0xc000)
>
> at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131)
> at
> org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505)
> at
> org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320)
> at
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48)
> at
> org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:153)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html