Thread: Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

"Antonio Gallardo"

Date:

09 January 2004, 10:08:45

This is the ofending SELECT:

SELECT A0.ROL_NAME,A0.ROL_ENABLE,A0.ROL_ID FROM AUTH_ROLE A0 WHERE
(A0.ROL_NAME LIKE  'z%') AND A0.ROL_NAME <>  'admin'  ORDER BY 1

I tried it using squirrelSQL - http://squirrel-sql.sourceforge.net/ and
even in psql it returns:

ERROR:  Invalid UNICODE character sequence found (0xc000)

in psql the "SELECT version();" returns:

PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

Lets focus on:

(A0.ROL_NAME LIKE  'z%')

If I query:

(A0.ROL_NAME LIKE  'za%')   --- It is OK, but

(A0.ROL_NAME LIKE  'az%')   --- throw the same exception

Looks like PostgreSQL incorrect manage this sequence: "z%"

Can you test it in a table while doing a similar cosntruction?

Can you check this?

Best Regards,

Antonio Gallardo.


Csaba Nagy dijo:
> Antonio,
>
> As Kris Jurka said in his posts, there's nothing special with the "z"
characters, so your browser/os must be doing something wrong with the
input. But the error you reported is a clear indication that the backend
gets a byte sequence which is not properly encoded as UTF-8. If you pass
only strings to the driver, then this is a driver error (means the
driver encodes the string improperly).
> To facilitate reproduction, I would suggest you to print out the unicode
characters of your query string with something like:
>
> ... embed this in your program:
>
>     for (int i = 0; i < queryString.length(); i++) {
>         System.out.print("\\u");
>         System.out.print(toHexString(queryString.charAt(i)));
>     }
>     System.out.println();
> ...
> private static final char[] hexChars =
>     { '0', '1', '2', '3', '4', '5', '6',
>       '7', '8', '9', 'A', 'B', 'C', 'D',
>       'E', 'F' };
>
> public static String toHexString(int n)
> {
>     char[] buffer = new char[4];
>     for (int i=0; i<4; i++) {
>         buffer[3-i] = hexChars[n & 0x0F];
>         n >>= 4;
>     }
>     return new String(buffer);
> }
>
> Then you can use the resulting string in the example program. This will
make sure that the other person on the other end of the email will have
exactly the same string as you - otherwise the you can bet that subtle
encoding differences get lost as you type.
>
> Cheers,
> Csaba.
>
>
> On Fri, 2004-01-09 at 06:18, Antonio Gallardo wrote:
>> Hi:
>>
>> First, here is the postgreSQL version used:
>> PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
>> i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
>>
>> I am aware similar problem was already sent to this list (I read some
of them), but I want to contribute with more interesting stuff:
>>
>> In the tested web application, we use postgreSQL JDBC driver. We have a 1
>> field form where we allow the user to writte a search pattern a table. The
>> generated SQL use LIKE to find for similars. Example:
>>
>> If the user write: "ant" then the answer will be:
>>
>> antonio
>> antoine
>> etc.
>>
>> This works fine, even if we left empty the form field, to show all the
records.
>>
>> The interesting stuff I found is:
>>
>> If we write just "z", "Z" or any string with that include the chars "z" or
>> "Z" at any point of the string in the field, then I got the below
error. How is this posible? I not an UTF-8, ISO-8859-1 or SQL_ASCII
expert, but for me "z" or "Z" is part of the ASCII that means a 1 byte
code in UTF-8.
>>
>> That means the driver has problems with an normal "z" or "Z"?
>>
>> Note: The same apply for the drivers:
>>
>> pg73jdbc.jar
>> pg74jdbc.jar
>> pg74.1jdbc.jar
>>
>> Please explain.
>>
>> Best Regards,
>>
>> Antonio Gallardo
>>
>> Caused by: java.sql.SQLException: ERROR:  Invalid UNICODE character
sequence found (0xc000)
>>
>> at org.postgresql.core.QueryExecutor.execute(QueryExecutor.java:131) at
>> org.postgresql.jdbc1.AbstractJdbc1Connection.ExecSQL(AbstractJdbc1Connection.java:505)
at
>> org.postgresql.jdbc1.AbstractJdbc1Statement.execute(AbstractJdbc1Statement.java:320)
at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:48)
at
>> org.postgresql.jdbc1.AbstractJdbc1Statement.executeQuery(AbstractJdbc1Statement.java:153)
>>
>> ---------------------------(end of
broadcast)--------------------------- TIP 5: Have you checked our
extensive FAQ?
>>
>>                http://www.postgresql.org/docs/faqs/FAQ.html
>

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

"Antonio Gallardo"

Date:

09 January 2004, 10:18:44

OK. Confirmed:

Steps to reproduce:

1. createdb -E UNICODE mydbname.
2. create a table with some varchar inside, we will query on this field.
Example:

CREATE TABLE auth_role
  (
    rol_id      int4          not null default
nextval('auth_rol_rol_id_seq'),
    rol_name    varchar(50)   unique not null,
    rol_enable  boolean       default true,

    primary key(rol_id)
  );

INSERT INTO auth_role(rol_name,rol_enable) VALUES ('admin',true);
INSERT INTO auth_role(rol_name,rol_enable) VALUES ('zorro',true);

3. run psql and write a select like this:

SELECT * FROM AUTH_ROLE WHERE ROL_NAME LIKE 'z%';

After all you was right, this is not a JDBC related bug! :-D

Since we can also reproduce it using psql. I hoped it was my fault, but
looks like a postgresql bug. :-(

Please confirm the bug.

Best Regards,

Antonio Gallardo

> in psql the "SELECT version();" returns:
>
> PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
> i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
>

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

Csaba Nagy

Date:

09 January 2004, 12:08:55

As I previously suggested, post it to the general list.
And add (again) the info about:
 - software you use;
 - OS, terminal encoding: might be relevant.

Cheers,
Csaba.

On Fri, 2004-01-09 at 15:18, Antonio Gallardo wrote:
> OK. Confirmed:
>
> Steps to reproduce:
>
> 1. createdb -E UNICODE mydbname.
> 2. create a table with some varchar inside, we will query on this field.
> Example:
>
> CREATE TABLE auth_role
>   (
>     rol_id      int4          not null default
> nextval('auth_rol_rol_id_seq'),
>     rol_name    varchar(50)   unique not null,
>     rol_enable  boolean       default true,
>
>     primary key(rol_id)
>   );
>
> INSERT INTO auth_role(rol_name,rol_enable) VALUES ('admin',true);
> INSERT INTO auth_role(rol_name,rol_enable) VALUES ('zorro',true);
>
> 3. run psql and write a select like this:
>
> SELECT * FROM AUTH_ROLE WHERE ROL_NAME LIKE 'z%';
>
> After all you was right, this is not a JDBC related bug! :-D
>
> Since we can also reproduce it using psql. I hoped it was my fault, but
> looks like a postgresql bug. :-(
>
> Please confirm the bug.
>
> Best Regards,
>
> Antonio Gallardo
>
> > in psql the "SELECT version();" returns:
> >
> > PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
> > i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

Csaba Nagy

Date:

09 January 2004, 12:10:39

Works fine on my sistem too, but I'm on a linux with UTF-8 encoding.
Anyway, if it gives you the same error from psql, then it's not a JDBC
bug, but rather a backend problem. I suggest you post the problem on the
pgsql-general list, giving all the details of your installation,
including the OS, the encoding of the terminal used, encoding of the DB,
version of the software... I suspect this is only reproducable under
your special circumstances, which I don't have here. On the general list
there are more chances somebody can reproduce it.

Cheers,
Csaba.

On Fri, 2004-01-09 at 15:05, Antonio Gallardo wrote:
> This is the ofending SELECT:
>
> SELECT A0.ROL_NAME,A0.ROL_ENABLE,A0.ROL_ID FROM AUTH_ROLE A0 WHERE
> (A0.ROL_NAME LIKE  'z%') AND A0.ROL_NAME <>  'admin'  ORDER BY 1
>
> I tried it using squirrelSQL - http://squirrel-sql.sourceforge.net/ and
> even in psql it returns:
>
> ERROR:  Invalid UNICODE character sequence found (0xc000)
>
> in psql the "SELECT version();" returns:
>
> PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
> i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)
>
> Lets focus on:
>
> (A0.ROL_NAME LIKE  'z%')
>
> If I query:
>
> (A0.ROL_NAME LIKE  'za%')   --- It is OK, but
>
> (A0.ROL_NAME LIKE  'az%')   --- throw the same exception
>
> Looks like PostgreSQL incorrect manage this sequence: "z%"
>
> Can you test it in a table while doing a similar cosntruction?
>
> Can you check this?
>
> Best Regards,
>
> Antonio Gallardo.
>

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

"Antonio Gallardo"

Date:

09 January 2004, 12:58:59

Tom Lane dijo:
> "Antonio Gallardo" <antonio@apache.org> writes:
>> After all you was right, this is not a JDBC related bug! :-D
>
> Your test case works fine for me.  Perhaps you could clue us in on what
> client-side encoding you're using?

I am using Linux Fedora Core 1.

uname -r returns:
2.4.22-1.2140.nptl

In psql \encoding reports me:
UNICODE

and SELECT version() returns:

PostgreSQL 7.3.4-RH on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gcc (GCC) 3.3.2 20031022 (Red Hat Linux 3.3.2-1)

If you need more info, I will be glad to contribute.

Best Regards,

Antonio Gallardo.

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

Tom Lane

Date:

09 January 2004, 12:59:42

"Antonio Gallardo" <antonio@apache.org> writes:
> After all you was right, this is not a JDBC related bug! :-D

Your test case works fine for me.  Perhaps you could clue us in on what
client-side encoding you're using?

            regards, tom lane

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

Paul Thomas

Date:

09 January 2004, 14:50:05

On 09/01/2004 14:18 Antonio Gallardo wrote:
> [snip]
> After all you was right, this is not a JDBC related bug! :-D
>
> Since we can also reproduce it using psql. I hoped it was my fault, but
> looks like a postgresql bug. :-(
>
> Please confirm the bug.


I can reproduce this on my Fedora machine. On an older system with RH7.2
and PG 7.3.4 compiled from source, I don't get the problem. Re-compiling
from source on the Fedora machine still gives the error. HTH

--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+

bytea and ISO-8859-1 encoding

From

"Uwe Kubosch"

Date:

09 January 2004, 21:29:35

Hi all!

A year ago I tried using postgresql for storing binary data, images, in a
bytea column.  I then got an error when trying to retrieve the binary data
with JDBC.  The length of the returned byte array was too long, and it was
transformed, possibly a charset encoding.

After some digging last year it seemed that the problem only occured for
databases with ISO-8859-1 encoding.

The original thread is at

http://archives.postgresql.org/pgsql-jdbc/2003-03/msg00008.php

Do you know if this error has been fixed?  I seem to be getting the exact
same error on psql 7.4.  I could find a reference at

http://developer.postgresql.org/todo.php

stating

"Prevent mismatch of frontend/backend encodings from converting bytea data
from being interpreted as encoded strings".

Can anyone confirm that the problem is known and is scheduled to be fixed?


donV

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

"Antonio Gallardo"

Date:

09 January 2004, 22:05:11

Paul Thomas dijo:
> On 09/01/2004 14:18 Antonio Gallardo wrote:
>> [snip]
>> After all you was right, this is not a JDBC related bug! :-D
>>
>> Since we can also reproduce it using psql. I hoped it was my fault, but
>> looks like a postgresql bug. :-(
>>
>> Please confirm the bug.
>
>
> I can reproduce this on my Fedora machine. On an older system with RH7.2
> and PG 7.3.4 compiled from source, I don't get the problem. Re-compiling
> from source on the Fedora machine still gives the error. HTH

Hi Paul:

Thanks for the answer. Then it seems to be a problem in Fedora. If this is
true, I guess the problem is inside the gcc libraries or a bad compile.

I already filled a bug in Fedora:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113231

Best Regards,

Antonio Gallardo.

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

Tom Lane

Date:

01 February 2004, 23:26:18

"Antonio Gallardo" <antonio@apache.org> writes:
> Paul Thomas dijo:
>> I can reproduce this on my Fedora machine. On an older system with RH7.2
>> and PG 7.3.4 compiled from source, I don't get the problem. Re-compiling
>> from source on the Fedora machine still gives the error. HTH

> I already filled a bug in Fedora:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=113231

I was finally able to reproduce this with Paul's help.

The bug turns out not to be Fedora-specific at all.  I believe it will
happen on any platform if you are using both a multibyte database
encoding (such as Unicode) *and* a non-C locale.

PG 7.4 has a more restricted form of the bug --- it's not locale
specific but does still require a multibyte encoding.  Here is a Unicode
example:

test=# select * from test where txtfield like '\277%'; --
ERROR:  invalid byte sequence for encoding "UNICODE": 0xc0

I've committed fixes in both CVS branches.

            regards, tom lane

Re: [BUG] - Invalid UNICODE character sequence found(0xc000)

From

Paul Thomas

Date:

02 February 2004, 06:53:42

On 02/02/2004 03:26 Tom Lane wrote:
> The bug turns out not to be Fedora-specific at all.  I believe it will
> happen on any platform if you are using both a multibyte database
> encoding (such as Unicode) *and* a non-C locale.
>
> PG 7.4 has a more restricted form of the bug --- it's not locale
> specific but does still require a multibyte encoding.  Here is a Unicode
> example:
>
> test=# select * from test where txtfield like '\277%'; --
> ERROR:  invalid byte sequence for encoding "UNICODE": 0xc0
>
> I've committed fixes in both CVS branches.


Thanks Tom. I've recompiled on my Fedora box using 7.3 CVS tip and it's
now a happy bunny :)


--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants         |
http://www.thomas-micro-systems-ltd.co.uk   |
+------------------------------+---------------------------------------------+