Thread: Problem with 7.0.3 dump -> 7.1b4 restore

Problem with 7.0.3 dump -> 7.1b4 restore

From
"Rainer Mager"
Date:
We have a Unicode (UTF-8) database that we are trying to upgrade to 7.1b4.
We did a pg_dumpall (yes, using the old version) and then tried a restore.
We hit the following 3 problems:

1. Some of the text is large, about 20k characters, and is multiline. For
almost all of the lines this was fine (postgres put a \ at the end of the
previos line) but for some it was not. The lines I looked at all had
non-English characters (Japanese and/or Korean) at the end of the line. When
the restore encountered these lines it failed and, since the dump uses COPY,
the entire table was left blank.

2. Some two-byte dash/hyphen characters DID get correctly imported into the
database but could not be read out again via JDBC, that is, when read the
record was truncated at the character. This _might_ be related to a long
standing Java core bug regarding improper conversions between certain
languages and the internal Unicode representation for hyphens.

3. One other character, a two-byte apostrophe, was not restoreable,
similarly to the hyphen problem.


After fighting the above, I decided to try doing the dump with the -dn
flags. This fixed problem #1 but not 2 or 3. If needed I can try to get
details about the problem characters.


Finally, not a bug but, we have written a small perl script that inserts
transactions around every 500 INSERT lines in a PG dump. This speeds up
large restores by about 100 times! Really! I think this might be a good
thing for the dump command to do automatically.


Best regards,

--Rainer

Re: Problem with 7.0.3 dump -> 7.1b4 restore

From
Tatsuo Ishii
Date:
> We have a Unicode (UTF-8) database that we are trying to upgrade to 7.1b4.
> We did a pg_dumpall (yes, using the old version) and then tried a restore.
> We hit the following 3 problems:
> 
> 1. Some of the text is large, about 20k characters, and is multiline. For
> almost all of the lines this was fine (postgres put a \ at the end of the
> previos line) but for some it was not. The lines I looked at all had
> non-English characters (Japanese and/or Korean) at the end of the line. When
> the restore encountered these lines it failed and, since the dump uses COPY,
> the entire table was left blank.
> 
> 2. Some two-byte dash/hyphen characters DID get correctly imported into the
> database but could not be read out again via JDBC, that is, when read the
> record was truncated at the character. This _might_ be related to a long
> standing Java core bug regarding improper conversions between certain
> languages and the internal Unicode representation for hyphens.
> 
> 3. One other character, a two-byte apostrophe, was not restoreable,
> similarly to the hyphen problem.
> 
> 
> After fighting the above, I decided to try doing the dump with the -dn
> flags. This fixed problem #1 but not 2 or 3. If needed I can try to get
> details about the problem characters.

This might be related to a known bug with 7.0.x. Can you grab a patch
from ftp://ftp.sra.co.jp/pub/cmd/postgres/7.0.3/patches/copy.patch.gz
and try again?

Or even better, can you give me a minimum set of data that reproduces
your problem?
--
Tatsuo Ishii


RE: Problem with 7.0.3 dump -> 7.1b4 restore

From
"Rainer Mager"
Date:
Well, I tried the patch and the newly produced dump was identical to the bad
dump from before, so the patch had no affect. I will try to trim it down to
a reasonably small file and email it to you.

--Rainer

> -----Original Message-----
> From: pgsql-bugs-owner@postgresql.org
> [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Tatsuo Ishii
> Sent: Friday, February 23, 2001 10:32 AM
> To: rmager@vgkk.com
> Cc: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org
> Subject: Re: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore
> This might be related to a known bug with 7.0.x. Can you grab a patch
> from ftp://ftp.sra.co.jp/pub/cmd/postgres/7.0.3/patches/copy.patch.gz
> and try again?
>
> Or even better, can you give me a minimum set of data that reproduces
> your problem?
> --
> Tatsuo Ishii



RE: Problem with 7.0.3 dump -> 7.1b4 restore

From
"Rainer Mager"
Date:
Attached is a single INSERT that shows the problem. The character after the
word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the
database, that is, the dump/restore seems ok, the problem is when trying to
read the text later. The database is UTF8 and I just tested with beta 5.

Oh, BTW, if I try to set (INSERT) this same character via JDBC and then
retreive it again then everything is fine.


--Rainer

> -----Original Message-----
> From: pgsql-bugs-owner@postgresql.org
> [mailto:pgsql-bugs-owner@postgresql.org]On Behalf Of Tatsuo Ishii
> Sent: Friday, February 23, 2001 10:32 AM
>
> Or even better, can you give me a minimum set of data that reproduces
> your problem?
> --
> Tatsuo Ishii

Attachment

RE: Problem with 7.0.3 dump -> 7.1b4 restore

From
Tatsuo Ishii
Date:
> Attached is a single INSERT that shows the problem. The character after the
> word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the
> database, that is, the dump/restore seems ok, the problem is when trying to
> read the text later. The database is UTF8 and I just tested with beta 5.
> 
> Oh, BTW, if I try to set (INSERT) this same character via JDBC and then
> retreive it again then everything is fine.

Thanks. I'll dig into it.
--
Tatsuo Ishii


RE: Problem with 7.0.3 dump -> 7.1b4 restore

From
Tatsuo Ishii
Date:
> Attached is a single INSERT that shows the problem. The character after the
> word "Fiber" truncates the text when using JDBC. NOTE, the text IS in the
> database, that is, the dump/restore seems ok, the problem is when trying to
> read the text later. The database is UTF8 and I just tested with beta 5.
> 
> Oh, BTW, if I try to set (INSERT) this same character via JDBC and then
> retreive it again then everything is fine.

I have tested your data using psql:

unicode=# create table pr_prop_info(i1 int, i2 int, i3 int, t text);
CREATE
unicode=# \encoding LATIN1
unicode=# \i example.sql 
INSERT 2378114 1
unicode=# select * from pr_prop_info;

The character after the word "Fiber" looks like "­Optic Cable". So as
long as the server/client encoding set correctly, it looks ok. I guess
we have some problems with JDBC driver. Unfortunately I am not a Java
guru at all. Can anyone look into our JDBC driver regarding this
problem?
--
Tatsuo Ishii


Problems with latest tests

From
"Rainer Mager"
Date:
Hi all,

    I haven't been following the current thread on failed tests but I just had
some so I thought I'd mention it. If this is a repeat then I apologize.

    I configured with:

./configure --enable-multibyte --enable-syslog --with-java --with-maxbackend
s=70



    And the tests give me this error:

Running with noclean mode on. Mistakes will not be cleaned up.
/opt/home/rmager/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql
/bin/pg_encoding: error while loading shared libraries:
/opt/home/rmager/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql
/bin/pg_encoding: undefined symbol: pg_char_to_encoding
initdb: pg_encoding failed

Perhaps you did not configure PostgreSQL for multibyte support or
the program was not successfully installed.



--Rainer

Problem with test results submission form

From
"Rainer Mager"
Date:
I tried to submit the results of my regression tests and got this:

Warning: PostgreSQL query failed: ERROR: parser: parse error at or near "t"
in
/home/projects/pgsql/developers/vev/public_html/regress/regress.php on line
359
Database write failed.


--Rainer

Problems with Multibyte in 7.1 beta?

From
"Rainer Mager"
Date:
I'm trying to run the latest CVS code's regression tests and have a problem.
They fail at initdb with this:


Running with noclean mode on. Mistakes will not be cleaned up.
/opt/home/rmager/devel/External/pgsql/src/test/regress/./tmp_check/install//
usr/local/pgsql/bin/pg_encoding: erro
r while loading shared libraries:
/opt/home/rmager/devel/External/pgsql/src/test/regress/./tmp_check/install//
usr
/local/pgsql/bin/pg_encoding: undefined symbol: pg_char_to_encoding
initdb: pg_encoding failed

Perhaps you did not configure PostgreSQL for multibyte support or
the program was not successfully installed.




I ran configure with this:

./configure --enable-multibyte --enable-syslog --with-java




Any ideas?

--Rainer



RE: Problem with 7.0.3 dump -> 7.1b4 restore

From
"Rainer Mager"
Date:
I just tested a bug I originally fount in 7.1b4 with the new 7.1RC3 and it
still exists. I would consider this a major bug because I know of no work
around.

Basically what happens is that a dump of an existing Unicode database (from
7.03) has a double-byte hyphen character that becomes \255 in the dump. When
the data is imported into the new 7.1 database it seems to correctly appear
(verified via psql) BUT when reading this record via JDBC the data is
truncated at this character.

I communicated briefly with Ishii-san regarding this a while back but I
never followed up. Considering RC3 is now out I thought I should revisit the
issue. It should be easy to test by editing and postgres Unicode database
dump and putting \255 somewhere in a string. I'm not sure if it matters but
the dump was done with "-dn" flags.

Thanks,

--Rainer


> -----Original Message-----
> From: Tatsuo Ishii [mailto:t-ishii@sra.co.jp]
> Sent: Wednesday, February 28, 2001 11:02 AM
> To: rmager@vgkk.com
> Cc: pgsql-bugs@postgresql.org; pgsql-hackers@postgresql.org
> Subject: RE: [BUGS] Problem with 7.0.3 dump -> 7.1b4 restore
>
>
> > Attached is a single INSERT that shows the problem. The
> character after the
> > word "Fiber" truncates the text when using JDBC. NOTE, the text
> IS in the
> > database, that is, the dump/restore seems ok, the problem is
> when trying to
> > read the text later. The database is UTF8 and I just tested with beta 5.
> >
> > Oh, BTW, if I try to set (INSERT) this same character via JDBC and then
> > retreive it again then everything is fine.
>
> I have tested your data using psql:
>
> unicode=# create table pr_prop_info(i1 int, i2 int, i3 int, t text);
> CREATE
> unicode=# \encoding LATIN1
> unicode=# \i example.sql
> INSERT 2378114 1
> unicode=# select * from pr_prop_info;
>
> The character after the word "Fiber" looks like "­Optic Cable". So as
> long as the server/client encoding set correctly, it looks ok. I guess
> we have some problems with JDBC driver. Unfortunately I am not a Java
> guru at all. Can anyone look into our JDBC driver regarding this
> problem?
> --
> Tatsuo Ishii



RE: Problem with 7.0.3 dump -> 7.1b4 restore

From
"Rainer Mager"
Date:
I noticed that 7.1 has officially been released. Does anyone know the status
of the bug I reported regarding encoding problems when dumping a 7.0 db an
restoring on 7.1?


Thanks,

--Rainer