COPY-ing ASCII file into UTF-8 database - Mailing list pgsql-admin

From Toomas Vendelin
Subject COPY-ing ASCII file into UTF-8 database
Date
Msg-id 2FAE86BD-E366-4F25-9CE0-3C089F761960@vendelin.com
Whole thread Raw
Responses Re: COPY-ing ASCII file into UTF-8 database
Re: COPY-ing ASCII file into UTF-8 database
List pgsql-admin
THE ISSUE:
I need to import a text file (ASCII) dumped from MySQL 3.23 database
into Postgre table (UTF-8). The file contains "specifically Estonian
characters" Õ, Ä, Ö, Ü. My attempt to COPY this file directly
failed. Postgre said:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf5706509
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

OK... The file isn't big, and there are just 4 characters causing
problems, so I substitued each of them with a distinctive pattern of
ASCII characters and successfully COPY-ied the file into Postgre
table. With PgAdmin I've changed 1 record back to Estonian – worked
fine. I then exported this table into a file, opened the file in
UltraEdit (the status bar showed it's a UTF-8 indeed) and changed a
few temporary substitutions with "specifically Estonian characters".
Then I cleared the table and tried to COPY data from my now UTF-8 file
again. I've got an error:

ERROR:  invalid input syntax for integer: "2"
CONTEXT:  COPY school, line 1, column sch_id: "2"

PgAdmin actually displayed it with something like chopped hyphen,
which I've failed to copy into this message and which wasn't visible
in the text editor even with "Show hidden characters".

MY QUESTIONS:

1. What is the proper solution for importing data from plain ASCII
files, that contain the type of characters mentioned above?

2. I'd like to fill the gaps (huge, obviously) in my knowlege about
handling different charsets. Can you point me to some sane
resource(s)? I'm sure, there's a lot of them. Sorry for slipping off
the topic here.

Regards,

Toomas




pgsql-admin by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Making field non case sensitive
Next
From: Tommy Gildseth
Date:
Subject: Re: COPY-ing ASCII file into UTF-8 database