Thread: Postgres Backup and Restore

Postgres Backup and Restore

From
"Mikel Lindsaar"
Date:
Hello List,

I have done some searching and looking at all the documentation I
could find, and I have found a (what i consider) a bug in pg_dump
pg_restore that I wanted to bounce off the list to find their
solutions.

I have a database which gets updated from many locations and it looks
like we have gotten some character encoding mis matches in the system.
 I am going through and fixing these as this is my problem, not
postgres's, however, in finding this bug in our data, I found a
situation with postgres.

If you have bad character encodings in your data, you can not backup
and restore that data using pg_dump and pg_restore, on Windows Server
2003, using compress or plain text modes.  Both modes fail on the bad
encoding when you try to restore.  I found this out while doing a test
run on our backup and restore process, this data had crept in over a
recent period and got into the database somehow - I am finding out how
:)

I see this as a bug because the data got in there and postgres
accepted it on the copy in, but then refuses to accept it on the
restore, but I wanted to find out what the list thought.

The database is 8.2, running on Windows Server 2003.  The database is
UTF8 as reported by \l in psql.

Now I have managed to do a backup and test restore, but I did this by
writing a ruby script that went through the plain text version of the
backup line by line and split out the good from the bad by attempting
to parse each line through Iconv.  It was not a fast process, but it
did result in me finding the 120 lines that contained bad data and get
them out of the file's COPY tables.

I then had to go and individually correct each of these fields to
handle the incorrect encodings.

Now, the problem is, I should be able to restore my backup, regardless
of this sort of problem.  The problem characters were all in text
fields and so should be able to be restored.

How are other admins handling such a situation?  What backup tools do you use?

I know on Oracle I can do a Binary hot backup which just bypasses the
whole problem.

But as it stands right now, I will have to keep this script handy or
constantly double and triple check encodings.

Note, I know the bad stuff shouldn't get in there to begin with, but
that is not the point here.  I should be able to restore to the state
I made the backup in.

Anyway, comments? Ideas?

I have attached some of the encodings below if anyone is interested.

Regards

Mikel
http://lindsaar.net/


"\201L J\303\203\342\200\234ZSEF U."
"\201N\t\\N\tHU\t2084\t20"
"DON\303\203\302\201T U."
"\201ROS\t\\N\tHU\t3580\t"
"\215HE 25\t\\N\t\\N\tHAM"
"\215GER\t\\N\t\\N\t2001-"
"\201SHINDEN 955-2\t\\"
"\215NBUCH\t\\N\tDE\t710"

Re: Postgres Backup and Restore

From
Tom Lane
Date:
"Mikel Lindsaar" <raasdnil@gmail.com> writes:
> If you have bad character encodings in your data, you can not backup
> and restore that data using pg_dump and pg_restore, on Windows Server
> 2003, using compress or plain text modes.  Both modes fail on the bad
> encoding when you try to restore.

Are you sure that the destination database has the same encoding
declaration as the source did?  How was the badly-encoded data
inserted in the first place?

            regards, tom lane

Re: Postgres Backup and Restore

From
"Phillip Smith"
Date:
> Are you sure that the destination database has the same encoding
> declaration as the source did?

That's what I'd be looking at - I had similar problems a few weeks ago. See
this thread:
http://archives.postgresql.org/pgsql-admin/2007-12/msg00235.php



THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

Re: Postgres Backup and Restore

From
"Phillip Smith"
Date:
> > Are you sure that the destination database has the same encoding
> > declaration as the source did?
>
> That's what I'd be looking at - I had similar problems a few weeks ago.
> See
> this thread:
> http://archives.postgresql.org/pgsql-admin/2007-12/msg00235.php

I've been thinking about this, and while I don't agree it's a bug, I think
that perhaps PostgreSQL should raise a notice or warning that the
destination database has different encoding than the file being restored...?

Obviously I'm not the only one (seemingly) caught out by this so I'm sure it
would save some heartache for people.


THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

Re: Postgres Backup and Restore

From
Tom Lane
Date:
"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
> I've been thinking about this, and while I don't agree it's a bug, I think
> that perhaps PostgreSQL should raise a notice or warning that the
> destination database has different encoding than the file being restored...?

If Postgres actually *knows* that the encodings are different, it can
deal with that.  The cases that are problematic are where the software
has been misinformed for one reason or another.  I doubt that a warning
issued (or not) on the basis of misinformation will be especially helpful.

            regards, tom lane

Re: Postgres Backup and Restore

From
"Phillip Smith"
Date:
> "Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
> > I've been thinking about this, and while I don't agree it's a bug, I
> think
> > that perhaps PostgreSQL should raise a notice or warning that the
> > destination database has different encoding than the file being
> restored...?
>
> If Postgres actually *knows* that the encodings are different, it can
> deal with that.  The cases that are problematic are where the software
> has been misinformed for one reason or another.  I doubt that a warning
> issued (or not) on the basis of misinformation will be especially helpful.

How has the software been misinformed and how can we avoid it then? :)

The start of the pgdump sets the client encoding, so shouldn't that tell PG
that things could get strange?


THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee.  If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

Re: Postgres Backup and Restore

From
Tom Lane
Date:
"Phillip Smith" <phillip.smith@weatherbeeta.com.au> writes:
> How has the software been misinformed and how can we avoid it then? :)

Well, if that were entirely clear then we could fix it.

> The start of the pgdump sets the client encoding, so shouldn't that tell PG
> that things could get strange?

The problems seem to arise when the data doesn't actually have the
claimed encoding.  We've been gradually tightening the encoding checks
over the years, so data that older versions let pass without comment
might now be recognized as malformed.  In the OP's case there may be
something else going on (such as an actual encoding-conversion bug).
I've encouraged him to try to trace things forward from his original
input data and see if we can find out exactly where it went off the
rails.

            regards, tom lane