Re: Incremental backups, and backup history - Mailing list pgsql-general

From Greg Stark
Subject Re: Incremental backups, and backup history
Date
Msg-id 87smq69n3b.fsf@stark.dyndns.tv
Whole thread Raw
In response to Incremental backups, and backup history  (Antonios Christofides <A.Christofides@itia.ntua.gr>)
List pgsql-general
Antonios Christofides <A.Christofides@itia.ntua.gr> writes:

> Is this filenames-instead-of-BLOBs for easier backup common practice?
> Any other ideas or comments?

This is a major point of contention. Some people think keeping all data in the
database is a better approach, others think data that isn't inherently
relational and doesn't need the protection of transactions doesn't really
belong in the database.

I happen to be firmly in the camp against putting such files in the database.
But it depends a lot on what your needs are. In every case I've faced this it
was simply useful to have the files accessible without piping them through the
database protocol every time. I could point the web server at the directory
and serve them up directly, or rsync them to the machines that could do that.
They could even be served up from a separate lightweight web server without
any database access altogether, which would have been impossible if the only
way to access them was via the database.

If you need to be able to update portions of your blobs, or if you need
transactional safety then you may need the database.

> My second question is a general relational database backup question, not
> specifically related to pgsql. Sometimes a user accidentally
> deletes/corrupts a file, and discovers it three days later. After they
> come panicing to me, I can give them their file as it was three days
> ago, because of the backup tape rotation. Now suppose a user deletes
> ten employees from the database, and three days later they understand
> that this was a bad thing. Now what? I can restore the entire database
> and make it as it was three days ago, but I can't restore the particular
> deleted records in the current database, as the relations make the
> database behave as a single unit.

What's worse is you may have several related changes to multiple tables. And
then you won't know if any other changes to other tables were later done that
depended on that data. There will be no way to do this perfectly in general.

If your data is particularly amenable to this form of normalization then it
can be useful though. For example, instead of storing counters that are
incremented, even when the total is the only interesting statistic, I normally
insert new records for every event. If ever it turns out something was wrong
and the events should be disregarded for a particular time period, or with
particular other properties, I have the complete history and can do that.
Inserts are also more efficient for the database to do than updates.

But for something like an employee table you're probably going to be stuck
with restoring the table to a new name, and having a human look over the old
data and the current data and updating the current data appropriately.

You may want to consider having a "deleted" flag column and not ever deleting
records. So at least undelete can be an easy operation that doesn't even need
restoring backups.

--
greg

pgsql-general by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: A creepy story about dates. How to prevent it?
Next
From: Tom Lane
Date:
Subject: Re: A creepy story about dates. How to prevent it?