Re: invalid byte sequence for encoding "UTF8" - Mailing list pgsql-general

From Albe Laurenz
Subject Re: invalid byte sequence for encoding "UTF8"
Date
Msg-id D960CB61B694CF459DCFB4B0128514C2A0B0EC@exadv11.host.magwien.gv.at
Whole thread Raw
In response to invalid byte sequence for encoding "UTF8"  (Glyn Astill <glynastill@yahoo.co.uk>)
List pgsql-general
Glyn Astill wrote:
> I've setup a postgres 8.2 server and have a database setup with UTF8
> encoding. I intend to read some of our legacy data into the table,
> this legacy data is in ASCII format, and as far as I know is 8 bit
> ASCII.
>
> We have a migration tool from mertechdata.com to convert these files
> that are in a DataFlex format into out postgres tables.

In which format are the data? Text files? SQL statements?
Something binary?

> Some files convert over okay, and some come up with the error message
> 'invalid byte sequence for encoding "UTF8"'. the files that come up
> with the error are created correctly and so are their index's, but as
> soon as we come to insert the data we get this error.

Well, so you claim, but can you prove it?
Do you use a PostgreSQL utility to import the data?
If yes, which tool? What is the exact command line?

> Does anyone know why we're getting this error message? And uis there
> a way to suppress it, or can we get around it using another format?

By "format" I believe that you mean "encoding".
It does not matter what encoding you use as long as the data can
be represented in it, you tell PostgreSQL what the encoding is, and
the data are correct.

There is no advantage of one encoding over the other in this respect.

> Our migration utility does ask us to select the correct encoding for
> our database, and we select UTF8 but we still get the error. What do
> you guys think? Possibly the migration tools fault?

If PostgreSQL says that the data is not UTF-8, we tend to believe it.

To say more, one would need more information.
Can you identify the string about which PostgreSQL complains?
What does it look like?

> I thought we may be able to get around it using SQL_ASCII encoding -
> but it's ony 7 bit, so would we loose some data? Also our conversion
> utility doesn't have the option to use SQL_ASCII.

If you use SQL_ASCII you may succeed in getting the incorrect data into
the database, but that will not make you happy because the data will
not stop being incorrect just because they are in the database.

Yours,
Laurenz Albe

pgsql-general by date:

Previous
From: "Alexander Staubo"
Date:
Subject: Re: PostgresSQL vs Ingress
Next
From: "Trevor Talbot"
Date:
Subject: Re: Linux v.s. Mac OS-X Performance