Thread: 22021: invalid byte sequence for encoding \"UNICODE\": 0xe16d61"

22021: invalid byte sequence for encoding \"UNICODE\": 0xe16d61"

From

"Lucas Sultanum"

Date:

31 October 2004, 05:48:24

Hello,

I am not sure if this is a bug or I am doing something wrong. When I execute the following command (insert into a_cadclias values ('6542','65465','amaro','ámaro')) on pgAdmin III Query it works pretty well, but when I try to do the same through a C# App connecting to the database through an ODBC driver I get the following error:

"ERROR: 22021: invalid byte sequence for encoding \"UNICODE\": 0xe16d61"

I know that it has something to do with the word ámaro because when I take the letter á off and replace it with the letter a it works fine.

Bellow goes the table structure:

CREATE TABLE a_cadclias

(

dba_clias_cliente "numeric"(8) NOT NULL,

dba_clias_associado "numeric"(8) NOT NULL,

dba_keyclias_sq "varchar"(8) NOT NULL,

teste "varchar"(10),

CONSTRAINT dba_keyclias_sq PRIMARY KEY (dba_keyclias_sq)

)

WITH OIDS;

Att: It is valid to said that I have also tried the Npgsql dll and got the same error.

Versions tested:

"PostgreSQL 8.0.0beta2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.2.3 (mingw special 20030504-1)"

AND

"PostgreSQL 8.0.0beta4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.3.1 (mingw special 20030804-1)"

Regards

Lucas Sultanum

Re: 22021: invalid byte sequence for encoding \"UNICODE\":

From

Benjamin Riefenstahl

Date:

31 October 2004, 14:18:37

Hi Lucas,

"Lucas Sultanum" writes:

> When I execute the following command (insert into a_cadclias values
> ('6542','65465','amaro','ámaro')) on pgAdmin III Query it works
> pretty well, but when I try to do the same through a C# App
> connecting to the database through an ODBC driver I get the
> following error:
>
> "ERROR: 22021: invalid byte sequence for encoding \"UNICODE\": 0xe16d61"
>
> I know that it has something to do with the word ámaro because when
> I take the letter á off and replace it with the letter a it works
> fine.

A user-level application like pgAdmin does (or should do) automatic
handling of encodings.  OTOH lots of programming languages don't
handle encodings automatically, the programmer is responsible there.

I don't know C#, but in C, C++ and Java you can not use non-ASCII
characters like 'á' literally without either some amount of luck or
doing a conversion before saving your file and before compiling the
code.

Your choices are

  a) avoid non-ASCII in source code (the preferred choice)

  b) make sure that all the encodings match (in the long run this only
     works with lots of luck)

  c) hardcode byte values (this only removes one of several failure
     points in comparison to b))

  d) use c) and additionally make sure that the encodings used in
     those byte values is matched to the encodings used in the target
     system (the database in this case) and in all transition systems
     (the database drivers in this case)

Option c) works out to using '\xC3\xA1' instead of a literal 'á' (in
the UTF-8 encoding).  Option d) with PostgreSQL means that in addition
you issue the command

  SET client_encoding TO UNICODE;

once for each database connection to instruct the driver to expect
UTF-8.

You can of course use some other encoding like e.g. Latin-1, if you
prefer that.  See the PostgreSQL documentation for your choices.

benny