Thread: moving data from windows to linux
Hi, I am having problem importing data from a windows xp2 postgresql server to a linux one. On the windows server I did: pg_dumpall -U postgres > dumpall.sql On the linux machine I did: tr -d '\r' < /data/dumpall.sql | psql -U pgsql postgres ERROR: invalid byte sequence for encoding "UTF8": 0x92 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". As adviced in http://pgfoundry.org/tracker/?func=detail&atid=538&aid=1000836&group_id=1000125 <http://pgfoundry.org/tracker/?func=detail&atid=538&aid=1000836&group_id=1000125>, I changed the client_encoding in postgresql.conf and restarted the server. I also changed it using \encoding in psql. But I get the same error. What am I missing ? Also, would a brutal copy of the base directory do the trick ? Cheers, Francois
Attachment
On Wednesday 25 April 2007 06:29, Francois Deliege <fdeliege@cs.aau.dk> wrote: > ERROR: invalid byte sequence for encoding "UTF8": 0x92 > HINT: This error can also happen if the byte sequence does not match > the encoding expected by the server, which is controlled by > "client_encoding". Newer versions of PostgreSQL do much stricter checks of UTF-8 data than older versions did. It is very possible that if you are upgrading versions that your original data is invalid UTF-8 even if it's in a UTF-8 database. And of course, if the source database is SQL-ASCII, then any sort of invalid data error is possible when importing to a UTF-8 database. iconv is usually suggested as the tool to correct encoding errors in the dump file prior to importing to a new database. There are several threads in the archives that address this issue. -- "It is a besetting vice of democracies to substitute public opinion for law." - James Fenimore Cooper
Alan Hodgson wrote: > On Wednesday 25 April 2007 06:29, Francois Deliege <fdeliege@cs.aau.dk> > wrote: > >> ERROR: invalid byte sequence for encoding "UTF8": 0x92 >> HINT: This error can also happen if the byte sequence does not match >> the encoding expected by the server, which is controlled by >> "client_encoding". >> > Newer versions of PostgreSQL do much stricter checks of UTF-8 data than > older versions did. It is very possible that if you are upgrading > versions that your original data is invalid UTF-8 even if it's in a > UTF-8 database. > I am moving from 8.2.4 under Windows to 8.2.4 under FreeBSD 6.2. The databases are all encoded in UTF8 on the windows server. The threads I have seen in the archives are all about version or database encoding issues. The only trick here, is that I am using tr to solve the endline problem. Cheers, Francois
Attachment
On Wednesday 25 April 2007 08:36, Francois Deliege <fdeliege@cs.aau.dk> wrote: > I am moving from 8.2.4 under Windows to 8.2.4 under FreeBSD 6.2. > The databases are all encoded in UTF8 on the windows server. > The threads I have seen in the archives are all about version or > database encoding issues. > The only trick here, is that I am using tr to solve the endline > problem. Is that actually necessary? I'm not seeing anything in the docs about cr/lf issues in dump files. Or maybe just use pg_dump on the FreeBSD box to create the dump file. -- "The true danger is when liberty is nibbled away, for expedients, and by parts ... the only thing necessary for evil to triumph is for good men to do nothing." -- Edmund Burke
Alan Hodgson wrote: > On Wednesday 25 April 2007 08:36, Francois Deliege <fdeliege@cs.aau.dk> > wrote: > >> I am moving from 8.2.4 under Windows to 8.2.4 under FreeBSD 6.2. >> The databases are all encoded in UTF8 on the windows server. >> The threads I have seen in the archives are all about version or >> database encoding issues. >> The only trick here, is that I am using tr to solve the endline >> problem. >> > > Is that actually necessary? I'm not seeing anything in the docs about > cr/lf issues in dump files. Or maybe just use pg_dump on the FreeBSD > box to create the dump file. > It seems so. I get this error when using directly the dump file from the windows server. ERROR: literal carriage return found in data HINT: Use "\r" to represent carriage return. I have been looking at sed, tr and dos2unix so far, but without success. Ken was right, the 'invalid byte sequence for encoding "UTF8": 0x92' error seems to have been caused by tr removing to many chars. The FreeBSD and Windows are the same machine, different HD, it makes it difficult to generate the dump of the Windows XP box from the FreeBSD box. I have been trying to "look" at the line of the dump file using tail, but the file seems to large to be handled.
Attachment
Alan Hodgson wrote: > On Wednesday 25 April 2007 08:36, Francois Deliege <fdeliege@cs.aau.dk> > wrote: > >> I am moving from 8.2.4 under Windows to 8.2.4 under FreeBSD 6.2. >> The databases are all encoded in UTF8 on the windows server. >> The threads I have seen in the archives are all about version or >> database encoding issues. I think I identified the "moving from Windows to FreeBSD" problem. In short: FreeBSD 6.2 cannot handle files larger than 1.7GB on NTFS partition mounted. --> Would be nice to enable splits to be handled by pg_dump / pg_restore. :-) Here are some pieces of evidence: Looking at the data that causes the problem in the music-data dump file: dimension1# awk '/16862\t8[7-9]/' /mnt/windows/music-data 16862 87 0.99999997673498697 2.7562404582937301e-010 5.3289215505938802e-011 1.63906667677656e-014 3.6428502118254301e-013 1.3056815099222999e-009 9.8685777446191209e-016 1.2298273123735399e-017 1.2144251138520999e-011 4.8726130130392397e-014 1.9123218467089399e-012 4.0612663109915703e-012 2.1516222137270299e-010 4.4099191584683199e-012 2.1392297825760101e-008 16862 88 0.99983678662586195 5.0406110505424003e-005 1.5563266660749601e-006 1.6705707371203099e-007 1.63349359798036e-011 3.48994254 Not a lot of data for 16862 88 compared to 16862 87 !!! That's what was causing the problem... dimension1# sed '/16862\t88/,//d' < /mnt/windows/music-data > /usr/home/fdeliege/musicdata-ok dimension1# ls -l musicdata-ok -rw-r--r-- 1 root fdeliege 1836729357 Apr 26 13:57 musicdata-ok The size of the file is 1.7GB A short python script to visualize the data mounted on the windows partition from my FreeBSD box... dimension1# python pythonscript.py /mnt/windows/music-data 1836729340 1000 e-011 3.48994254 The same running script running from windows gives me what I expected, a lot of numbers. :-) Now, I am splitting the 150 GB dump file into 1G chunks... time for a beer. Cheers, Francois