Incremental backups, and backup history - Mailing list pgsql-general

From Antonios Christofides
Subject Incremental backups, and backup history
Date
Msg-id 20030619084228.GA20482@itia.ntua.gr
Whole thread Raw
Responses Re: Incremental backups, and backup history  ("Shridhar Daithankar" <shridhar_daithankar@persistent.co.in>)
Re: Incremental backups, and backup history  (Ron Johnson <ron.l.johnson@cox.net>)
Re: Incremental backups, and backup history  (Bruno Wolff III <bruno@wolff.to>)
Re: Incremental backups, and backup history  (Greg Stark <gsstark@mit.edu>)
Re: Incremental backups, and backup history  (Dennis Gearon <gearond@cvc.net>)
List pgsql-general
Hi,

I have two backup questions, not much related to each other; here they
are.

First: With PostgreSQL, I can't do incremental backups. pg_dump will
dump the entire database. Thus, if I want to keep daily backups on tape,
I'm in trouble because I'll have to do a full backup every day, which
may need several hours and several tapes.

One workaround I'm thinking is to not store BLOBs in the database, but
store them in the filesystem and store the filenames in the database
instead.  This needs some additional work (for example, a garbage
collector to periodically delete unreferenced files), but will move a
large amount of space from the database into the filesystem, which is
capable of incremental backups. Only BLOBs, that is; for some tables
that will have several tens of millions of small rows, I can't think of
any workaround.

Is this filenames-instead-of-BLOBs for easier backup common practice?
Any other ideas or comments?

My second question is a general relational database backup question, not
specifically related to pgsql. Sometimes a user accidentally
deletes/corrupts a file, and discovers it three days later. After they
come panicing to me, I can give them their file as it was three days
ago, because of the backup tape rotation. Now suppose a user deletes
ten employees from the database, and three days later they understand
that this was a bad thing. Now what? I can restore the entire database
and make it as it was three days ago, but I can't restore the particular
deleted records in the current database, as the relations make the
database behave as a single unit.

A colleague suggested, instead of updating or deleting rows, to only
insert rows with a timestamp; for example, instead of updating the
employee row with id=55, you insert a new row with id=55 with the
updated data, and change the primary key to (id, dateinserted). You then
always select the row with max dateinserted. A garbage collector is also
needed to periodically delete obsolete rows older than, say, six months.
Improvements can be made (such as using dateobsoleted instead of
dateinserted or moving the old rows to another table), but even in the
simplest cases I think it will be extremely hard to implement such a
system, again because of the relations.

So, it is a matter of database design? Do I have to design the database
so that it keeps the history of what happened?

Thanks everyone for the answers.

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Can inherited tables help in this case?
Next
From: "Shridhar Daithankar"
Date:
Subject: Re: Incremental backups, and backup history