Thread: Going from a DB using SQL_ASCII to UNICODE
Hi - We are running PostgreSQL 7.1.4, and are finally upgrading... to 7.4.2. While we do this upgrade, we also want to make create our DB as UNICODE (we are having problems with SQL_ASCII where we get a bunch of ? marks). When we try to import the data from a SQL_ASCII dumped db, into the new UNICODE db, everything freaks out. We get errors restoring from dumps that used COPY and with INSERT. We also tried to use the 7.4 pg_dump to make sure that it wasn't a problem with the old pg_dump. The errors are: - ERROR: missing data for column "noiselevel" - CONTEXT: COPY messages, line 1: "4393 -1 1441 14147 0 2000-10-12 18:30:58-05 EJB Design Questions \N Hi," - FATAL: invalid frontend message type 60 - ERROR: invalid byte sequence for encoding "UNICODE": 0xe9616c Anyway, if anyone has any recommendations on how to migrate from SQL_ASCII to UNICODE, with the knowledge that we are going from 7.1 to 7.4 at the same time.... You will be life-savers! Cheers, Dion
"Dion Almaer" <dion@almaer.com> writes: > When we try to import the data from a SQL_ASCII dumped db, into the new > UNICODE db, everything freaks out. I'm no expert on this stuff, but I think what you need to do is add set client_encoding = sql_ascii; to the top of the dump file. (As of fairly recently, pg_dump will automatically add such a SET, but I'm pretty sure 7.4.2 won't.) If that doesn't help, then what you have is actually not valid UTF-8 data, and what you'll have to do is figure out what encoding it's in and specify that instead. If it's in a mishmash of different encodings, you're in for some pain :-( regards, tom lane
Tom, I will try to add "set client_encoding = sql_ascii;" and pray :) Cheers, Dion > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, April 19, 2004 9:21 PM > To: Dion Almaer > Cc: pgsql-admin@postgresql.org; 'Lloyd Benson'; Eric Preston > Subject: Re: [ADMIN] Going from a DB using SQL_ASCII to UNICODE > > "Dion Almaer" <dion@almaer.com> writes: > > When we try to import the data from a SQL_ASCII dumped db, into the > > new UNICODE db, everything freaks out. > > I'm no expert on this stuff, but I think what you need to do is add > > set client_encoding = sql_ascii; > > to the top of the dump file. (As of fairly recently, pg_dump > will automatically add such a SET, but I'm pretty sure 7.4.2 won't.) > > If that doesn't help, then what you have is actually not > valid UTF-8 data, and what you'll have to do is figure out > what encoding it's in and specify that instead. If it's in a > mishmash of different encodings, you're in for some pain :-( > > regards, tom lane > >
Hi, I answered to a similiar problem on the [GENERAL] list a few days ago. In short: You can use "recode" in a pipe between dump and restore like "pg_dump database|recode ascii..utf8|psql newdatabase" to get rid of unicode problems while transferring from a different encoding. Please look at my statements on the[GENERAL] list for the gory details. Regards, Frank. On Mon, 19 Apr 2004 15:44:33 -0400 pgsql-admin-owner@postgresql.org wrote: > > Hi - > > We are running PostgreSQL 7.1.4, and are finally upgrading... to 7.4.2. > > While we do this upgrade, we also want to make create our DB as UNICODE (we > are having problems with SQL_ASCII where we get a bunch of ? marks). > > When we try to import the data from a SQL_ASCII dumped db, into the new > UNICODE db, everything freaks out. > We get errors restoring from dumps that used COPY and with INSERT. > > We also tried to use the 7.4 pg_dump to make sure that it wasn't a problem > with the old pg_dump. > > The errors are: > > - ERROR: missing data for column "noiselevel" > - CONTEXT: COPY messages, line 1: "4393 -1 1441 14147 0 > 2000-10-12 18:30:58-05 EJB Design Questions \N Hi," > - FATAL: invalid frontend message type 60 > - ERROR: invalid byte sequence for encoding "UNICODE": 0xe9616c > > Anyway, if anyone has any recommendations on how to migrate from SQL_ASCII > to UNICODE, with the knowledge that we are going from 7.1 to 7.4 at the same > time.... You will be life-savers! > > Cheers, > > Dion
hi, Dion Almaer wrote, On 4/19/2004 21:44: > - ERROR: missing data for column "noiselevel" > - CONTEXT: COPY messages, line 1: "4393 -1 1441 14147 0 > 2000-10-12 18:30:58-05 EJB Design Questions \N Hi," for this one: better to dump with: -d C.
hi, Dion Almaer wrote, On 4/19/2004 21:44: > - ERROR: invalid byte sequence for encoding "UNICODE": 0xe9616c and for this: convert your dump with iconv to unicode, then load it. C.
Am Dienstag, 20. April 2004 07:05 schrieb Frank Finner: > I answered to a similiar problem on the [GENERAL] list a few days ago. In > short: You can use "recode" in a pipe between dump and restore like > "pg_dump database|recode ascii..utf8|psql newdatabase" to get rid of > unicode problems while transferring from a different encoding. Please look > at my statements on the[GENERAL] list for the gory details. This is not necessary. You just need to set the client encoding correctly (not SQL_ASCII, but something like LATIN1). The encoding conversion happens automatically, as usual.