Thread: Going from a DB using SQL_ASCII to UNICODE

Going from a DB using SQL_ASCII to UNICODE

From
"Dion Almaer"
Date:
Hi -

We are running PostgreSQL 7.1.4, and are finally upgrading... to 7.4.2.

While we do this upgrade, we also want to make create our DB as UNICODE (we
are having problems with SQL_ASCII where we get a bunch of ? marks).

When we try to import the data from a SQL_ASCII dumped db, into the new
UNICODE db, everything freaks out.
We get errors restoring from dumps that used COPY and with INSERT.

We also tried to use the 7.4 pg_dump to make sure that it wasn't a problem
with the old pg_dump.

The errors are:

- ERROR:  missing data for column "noiselevel"
- CONTEXT:  COPY messages, line 1: "4393  -1      1441    14147   0
2000-10-12 18:30:58-05  EJB Design Questions    \N      Hi,"
- FATAL:  invalid frontend message type 60
- ERROR:  invalid byte sequence for encoding "UNICODE": 0xe9616c

Anyway, if anyone has any recommendations on how to migrate from SQL_ASCII
to UNICODE, with the knowledge that we are going from 7.1 to 7.4 at the same
time.... You will be life-savers!

Cheers,

Dion


Re: Going from a DB using SQL_ASCII to UNICODE

From
Tom Lane
Date:
"Dion Almaer" <dion@almaer.com> writes:
> When we try to import the data from a SQL_ASCII dumped db, into the new
> UNICODE db, everything freaks out.

I'm no expert on this stuff, but I think what you need to do is add

    set client_encoding = sql_ascii;

to the top of the dump file.  (As of fairly recently, pg_dump will
automatically add such a SET, but I'm pretty sure 7.4.2 won't.)

If that doesn't help, then what you have is actually not valid UTF-8
data, and what you'll have to do is figure out what encoding it's in and
specify that instead.  If it's in a mishmash of different encodings,
you're in for some pain :-(

            regards, tom lane

Re: Going from a DB using SQL_ASCII to UNICODE

From
"Dion Almaer"
Date:
Tom,

I will try to add "set client_encoding = sql_ascii;" and pray :)

Cheers,

Dion

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, April 19, 2004 9:21 PM
> To: Dion Almaer
> Cc: pgsql-admin@postgresql.org; 'Lloyd Benson'; Eric Preston
> Subject: Re: [ADMIN] Going from a DB using SQL_ASCII to UNICODE
>
> "Dion Almaer" <dion@almaer.com> writes:
> > When we try to import the data from a SQL_ASCII dumped db, into the
> > new UNICODE db, everything freaks out.
>
> I'm no expert on this stuff, but I think what you need to do is add
>
>     set client_encoding = sql_ascii;
>
> to the top of the dump file.  (As of fairly recently, pg_dump
> will automatically add such a SET, but I'm pretty sure 7.4.2 won't.)
>
> If that doesn't help, then what you have is actually not
> valid UTF-8 data, and what you'll have to do is figure out
> what encoding it's in and specify that instead.  If it's in a
> mishmash of different encodings, you're in for some pain :-(
>
>             regards, tom lane
>
>


Re: Going from a DB using SQL_ASCII to UNICODE

From
Frank Finner
Date:
Hi,

I answered to a similiar problem on the [GENERAL] list a few days ago. In
short: You can use "recode" in a pipe between dump and restore like
"pg_dump database|recode ascii..utf8|psql newdatabase" to get rid of unicode
problems while transferring from a different encoding. Please look at my
statements on the[GENERAL] list for the gory details.

Regards, Frank.


On Mon, 19 Apr 2004 15:44:33 -0400
pgsql-admin-owner@postgresql.org wrote:

>
> Hi -
>
> We are running PostgreSQL 7.1.4, and are finally upgrading... to 7.4.2.
>
> While we do this upgrade, we also want to make create our DB as UNICODE (we
> are having problems with SQL_ASCII where we get a bunch of ? marks).
>
> When we try to import the data from a SQL_ASCII dumped db, into the new
> UNICODE db, everything freaks out.
> We get errors restoring from dumps that used COPY and with INSERT.
>
> We also tried to use the 7.4 pg_dump to make sure that it wasn't a problem
> with the old pg_dump.
>
> The errors are:
>
> - ERROR:  missing data for column "noiselevel"
> - CONTEXT:  COPY messages, line 1: "4393  -1      1441    14147   0
> 2000-10-12 18:30:58-05  EJB Design Questions    \N      Hi,"
> - FATAL:  invalid frontend message type 60
> - ERROR:  invalid byte sequence for encoding "UNICODE": 0xe9616c
>
> Anyway, if anyone has any recommendations on how to migrate from SQL_ASCII
> to UNICODE, with the knowledge that we are going from 7.1 to 7.4 at the same
> time.... You will be life-savers!
>
> Cheers,
>
> Dion

Re: Going from a DB using SQL_ASCII to UNICODE

From
CoL
Date:
hi,

Dion Almaer wrote, On 4/19/2004 21:44:
> - ERROR:  missing data for column "noiselevel"
> - CONTEXT:  COPY messages, line 1: "4393  -1      1441    14147   0
> 2000-10-12 18:30:58-05  EJB Design Questions    \N      Hi,"

for this one: better to dump with: -d

C.

Re: Going from a DB using SQL_ASCII to UNICODE

From
CoL
Date:
hi,

Dion Almaer wrote, On 4/19/2004 21:44:
> - ERROR:  invalid byte sequence for encoding "UNICODE": 0xe9616c

and for this: convert your dump with iconv to unicode, then load it.

C.

Re: Going from a DB using SQL_ASCII to UNICODE

From
Peter Eisentraut
Date:
Am Dienstag, 20. April 2004 07:05 schrieb Frank Finner:
> I answered to a similiar problem on the [GENERAL] list a few days ago. In
> short: You can use "recode" in a pipe between dump and restore like
> "pg_dump database|recode ascii..utf8|psql newdatabase" to get rid of
> unicode problems while transferring from a different encoding. Please look
> at my statements on the[GENERAL] list for the gory details.

This is not necessary.  You just need to set the client encoding correctly
(not SQL_ASCII, but something like LATIN1).  The encoding conversion happens
automatically, as usual.