Thread: Problem with 7.0.3 dump -> 7.1b4 restore
We have a Unicode (UTF-8) database that we are trying to upgrade to 7.1b4. We did a pg_dumpall (yes, using the old version) and then tried a restore. We hit the following 3 problems: 1. Some of the text is large, about 20k characters, and is multiline. For almost all of the lines this was fine (postgres put a \ at the end of the previos line) but for some it was not. The lines I looked at all had non-English characters (Japanese and/or Korean) at the end of the line. When the restore encountered these lines it failed and, since the dump uses COPY, the entire table was left blank. 2. Some two-byte dash/hyphen characters DID get correctly imported into the database but could not be read out again via JDBC, that is, when read the record was truncated at the character. This _might_ be related to a long standing Java core bug regarding improper conversions between certain languages and the internal Unicode representation for hyphens. 3. One other character, a two-byte apostrophe, was not restoreable, similarly to the hyphen problem. After fighting the above, I decided to try doing the dump with the -dn flags. This fixed problem #1 but not 2 or 3. If needed I can try to get details about the problem characters. Finally, not a bug but, we have written a small perl script that inserts transactions around every 500 INSERT lines in a PG dump. This speeds up large restores by about 100 times! Really! I think this might be a good thing for the dump command to do automatically. Best regards, --Rainer
> We have a Unicode (UTF-8) database that we are trying to upgrade to 7.1b4. > We did a pg_dumpall (yes, using the old version) and then tried a restore. > We hit the following 3 problems: > > 1. Some of the text is large, about 20k characters, and is multiline. For > almost all of the lines this was fine (postgres put a \ at the end of the > previos line) but for some it was not. The lines I looked at all had > non-English characters (Japanese and/or Korean) at the end of the line. When > the restore encountered these lines it failed and, since the dump uses COPY, > the entire table was left blank. > > 2. Some two-byte dash/hyphen characters DID get correctly imported into the > database but could not be read out again via JDBC, that is, when read the > record was truncated at the character. This _might_ be related to a long > standing Java core bug regarding improper conversions between certain > languages and the internal Unicode representation for hyphens. > > 3. One other character, a two-byte apostrophe, was not restoreable, > similarly to the hyphen problem. > > > After fighting the above, I decided to try doing the dump with the -dn > flags. This fixed problem #1 but not 2 or 3. If needed I can try to get > details about the problem characters. This might be related to a known bug with 7.0.x. Can you grab a patch from ftp://ftp.sra.co.jp/pub/cmd/postgres/7.0.3/patches/copy.patch.gz and try again? Or even better, can you give me a minimum set of data that reproduces your problem? -- Tatsuo Ishii
Well, I tried the patch and the newly produced dump was identical to the bad dump from before, so the patch had no affect. I will try to trim it down to a reasonably small file and email it to you. --Rainer > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org > [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Tatsuo Ishii > Sent: Friday, February 23, 2001 10:32 AM > To: rmager@vgkk.com > Cc: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore > This might be related to a known bug with 7.0.x. Can you grab a patch > from ftp://ftp.sra.co.jp/pub/cmd/postgres/7.0.3/patches/copy.patch.gz > and try again? > > Or even better, can you give me a minimum set of data that reproduces > your problem? > -- > Tatsuo Ishii
Attached is a single INSERT that shows the problem. The character after the word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the database, that is, the dump/restore seems ok, the problem is when trying to read the text later. The database is UTF8 and I just tested with beta 5. Oh, BTW, if I try to set (INSERT) this same character via JDBC and then retreive it again then everything is fine. --Rainer > -----Original Message----- > From: pgsql-bugs-owner@postgresql.org > [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Tatsuo Ishii > Sent: Friday, February 23, 2001 10:32 AM > > Or even better, can you give me a minimum set of data that reproduces > your problem? > -- > Tatsuo Ishii
Attachment
> Attached is a single INSERT that shows the problem. The character after the > word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the > database, that is, the dump/restore seems ok, the problem is when trying to > read the text later. The database is UTF8 and I just tested with beta 5. > > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then > retreive it again then everything is fine. Thanks. I'll dig into it. -- Tatsuo Ishii
> Attached is a single INSERT that shows the problem. The character after the > word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the > database, that is, the dump/restore seems ok, the problem is when trying to > read the text later. The database is UTF8 and I just tested with beta 5. > > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then > retreive it again then everything is fine. I have tested your data using psql: unicode=# create table pr_prop_info(i1 int, i2 int, i3 int, t text); CREATE unicode=# \encoding LATIN1 unicode=# \i example.sql INSERT 2378114 1 unicode=# select * from pr_prop_info; The character after the word "Fiber" looks like "Optic Cable". So as long as the server/client encoding set correctly, it looks ok. I guess we have some problems with JDBC driver. Unfortunately I am not a Java guru at all. Can anyone look into our JDBC driver regarding this problem? -- Tatsuo Ishii
Hi all, I haven't been following the current thread on failed tests but I just had some so I thought I'd mention it. If this is a repeat then I apologize. I configured with: ./configure --enable-multibyte --enable-syslog --with-java --with-maxbackend s=70 And the tests give me this error: Running with noclean mode on. Mistakes will not be cleaned up. /opt/home/rmager/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql /bin/pg_encoding: error while loading shared libraries: /opt/home/rmager/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql /bin/pg_encoding: undefined symbol: pg_char_to_encoding initdb: pg_encoding failed Perhaps you did not configure PostgreSQL for multibyte support or the program was not successfully installed. --Rainer
I tried to submit the results of my regression tests and got this: Warning: PostgreSQL query failed: ERROR: parser: parse error at or near "t" in /home/projects/pgsql/developers/vev/public_html/regress/regress.php on line 359 Database write failed. --Rainer
I'm trying to run the latest CVS code's regression tests and have a problem. They fail at initdb with this: Running with noclean mode on. Mistakes will not be cleaned up. /opt/home/rmager/devel/External/pgsql/src/test/regress/./tmp_check/install// usr/local/pgsql/bin/pg_encoding: erro r while loading shared libraries: /opt/home/rmager/devel/External/pgsql/src/test/regress/./tmp_check/install// usr /local/pgsql/bin/pg_encoding: undefined symbol: pg_char_to_encoding initdb: pg_encoding failed Perhaps you did not configure PostgreSQL for multibyte support or the program was not successfully installed. I ran configure with this: ./configure --enable-multibyte --enable-syslog --with-java Any ideas? --Rainer
I just tested a bug I originally fount in 7.1b4 with the new 7.1RC3 and it still exists. I would consider this a major bug because I know of no work around. Basically what happens is that a dump of an existing Unicode database (from 7.03) has a double-byte hyphen character that becomes \255 in the dump. When the data is imported into the new 7.1 database it seems to correctly appear (verified via psql) BUT when reading this record via JDBC the data is truncated at this character. I communicated briefly with Ishii-san regarding this a while back but I never followed up. Considering RC3 is now out I thought I should revisit the issue. It should be easy to test by editing and postgres Unicode database dump and putting \255 somewhere in a string. I'm not sure if it matters but the dump was done with "-dn" flags. Thanks, --Rainer > -----Original Message----- > From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp] > Sent: Wednesday, February 28, 2001 11:02 AM > To: rmager@vgkk.com > Cc: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org > Subject: RE: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore > > > > Attached is a single INSERT that shows the problem. The > character after the > > word "Fiber" truncates the text when using JDBC. NOTE, the text > IS in the > > database, that is, the dump/restore seems ok, the problem is > when trying to > > read the text later. The database is UTF8 and I just tested with beta 5. > > > > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then > > retreive it again then everything is fine. > > I have tested your data using psql: > > unicode=# create table pr_prop_info(i1 int, i2 int, i3 int, t text); > CREATE > unicode=# \encoding LATIN1 > unicode=# \i example.sql > INSERT 2378114 1 > unicode=# select * from pr_prop_info; > > The character after the word "Fiber" looks like "Optic Cable". So as > long as the server/client encoding set correctly, it looks ok. I guess > we have some problems with JDBC driver. Unfortunately I am not a Java > guru at all. Can anyone look into our JDBC driver regarding this > problem? > -- > Tatsuo Ishii
I noticed that 7.1 has officially been released. Does anyone know the status of the bug I reported regarding encoding problems when dumping a 7.0 db an restoring on 7.1? Thanks, --Rainer