Home > mailing lists

Thread: invalid byte sequence for encoding "UTF8": 0xab

invalid byte sequence for encoding "UTF8": 0xab

From

"Grand, Mark D."

Date:

05 June 2009, 08:59:34

I am having a vexing problem with a script I am writing to populate reference tables in a new database.

I am running postgreSQL 8.3 with psql 8.3.7.

Psql reads this SQL statement:

INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)

VALUES ('Super-User Authorization',

'This allows a super-user to administer all meta-data.',

'UserID «Administer» ()');

and I get this message:

ERROR: invalid byte sequence for encoding "UTF8": 0xab

HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

It is complaining about the ‘«’ character. I do not understand why. The database is created the commands

CREATE DATABASE mayyou

WITH OWNER=meta_auth ENCODING='UTF8';

ALTER DATABASE mayyou SET client_encoding = 'UTF8';

When I give psql the \encoding command, it replies

UTF8

Why is it complaining about this valid character code?

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

Re: invalid byte sequence for encoding "UTF8": 0xab

From

Tom Lane

Date:

05 June 2009, 10:57:56

"Grand, Mark D." <mgrand@emory.edu> writes:
> ... I get this message:
> ERROR:  invalid byte sequence for encoding "UTF8": 0xab
> HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 

> It is complaining about the '<' character.  I do not understand why.

The ASCII code for '<' is 0x3c, not 0xab.  I am not sure what you are
actually typing; although it's suggestive that the LATIN1 code 0xab
corresponds to a symbol that looks approximately like '<<'.  The most
likely bet is that you are typing the wrong thing and using a terminal
emulator that is not set to generate UTF8-encoded characters.  You
should try to make sure that client_encoding is set to match what your
keyboard actually generates.

            regards, tom lane

Re: invalid byte sequence for encoding "UTF8": 0xab

From

Vick Khera

Date:

05 June 2009, 12:10:30

On Fri, Jun 5, 2009 at 9:57 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> The ASCII code for '<' is 0x3c, not 0xab.  I am not sure what you are
> actually typing; although it's suggestive that the LATIN1 code 0xab
> corresponds to a symbol that looks approximately like '<<'.  The most
> likely bet is that you are typing the wrong thing and using a terminal

Must be something with your mail program, because in the version I am
reading postgres is complaining about the "approximately like '<<'"
symbol.

Re: invalid byte sequence for encoding "UTF8": 0xab

From

"Albe Laurenz"

Date:

08 June 2009, 06:59:15

Mark D. Grand wrote:
> I am having a vexing problem with a script I am writing to
> populate reference tables in a new database.
>
> I am running postgreSQL 8.3 with psql 8.3.7.
>
> Psql reads this SQL statement:
>
>     INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
>         VALUES ('Super-User Authorization',
>                 'This allows a super-user to administer all meta-data.',
>                 'UserID «Administer» ()');
>
> and I get this message:
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0xab
>
> HINT:  This error can also happen if the byte sequence does
> not match the encoding expected by the server, which is
> controlled by "client_encoding".
>
> It is complaining about the '«' character.  I do not
> understand why.  The database is created the commands
>
> CREATE DATABASE mayyou
>                 WITH OWNER=meta_auth ENCODING='UTF8';
>
> ALTER DATABASE mayyou SET client_encoding = 'UTF8';
>
> When I give psql the \encoding command, it replies
>                 UTF8
>
> Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo "«" | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

Re: invalid byte sequence for encoding "UTF8": 0xab

From

"Grand, Mark D."

Date:

08 June 2009, 08:28:37

It turns out that my problem was that the editor I was using (emacs) does not properly support utf8 encoding.

-----Original Message-----
From: Albe Laurenz [mailto:laurenz.albe@wien.gv.at]
Sent: Monday, June 08, 2009 5:59 AM
To: Grand, Mark D.; pgsql-general@postgresql.org
Subject: RE: [GENERAL] invalid byte sequence for encoding "UTF8": 0xab

Mark D. Grand wrote:
> I am having a vexing problem with a script I am writing to
> populate reference tables in a new database.
>
> I am running postgreSQL 8.3 with psql 8.3.7.
>
> Psql reads this SQL statement:
>
>     INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
>         VALUES ('Super-User Authorization',
>                 'This allows a super-user to administer all meta-data.',
>                 'UserID <Administer> ()');
>
> and I get this message:
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0xab
>
> HINT:  This error can also happen if the byte sequence does
> not match the encoding expected by the server, which is
> controlled by "client_encoding".
>
> It is complaining about the '<' character.  I do not
> understand why.  The database is created the commands
>
> CREATE DATABASE mayyou
>                 WITH OWNER=meta_auth ENCODING='UTF8';
>
> ALTER DATABASE mayyou SET client_encoding = 'UTF8';
>
> When I give psql the \encoding command, it replies
>                 UTF8
>
> Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo "<" | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.  If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).

Re: invalid byte sequence for encoding "UTF8": 0xab

From

Dimitri Fontaine

Date:

08 June 2009, 08:56:00

"Grand, Mark D." <mgrand@emory.edu> writes:

> It turns out that my problem was that the editor I was using (emacs)
> does not properly support utf8 encoding.

Emacs does support utf8 properly.
  http://www.emacswiki.org/emacs/ChangingEncodings

It could be I'm biased because I use emacs from CVS, which is going to
be emacs23, and is as stable as emacs has always been for me.
  http://emacs.orebokech.com/
  http://atomized.org/wp-content/cocoa-emacs-nightly/

From within emacs, to get a ton of information about char under point,
try C-x = (one line version) or M-x describe-char (full version): <
 Char: < (60, #o74, #x3c) point=1312 of 4162 (31%) <301-4163> column=66

        character: < (60, #o74, #x3c)
preferred charset: ascii (ASCII (ISO646 IRV))
       code point: 0x3C
           syntax: .     which means: punctuation
         category: .:Base, a:ASCII, l:Latin, r:Roman
      buffer code: #x3C
        file code: #x3C (encoded by coding system utf-8-emacs)
          display: by this font (glyph code)
    xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-16-*-*-*-m-0-iso10646-1 (#x1F)


But I guess we're off topic now.

HTH, regards,
--
dim