Re: Incremental backups, and backup history - Mailing list pgsql-general
From | Greg Stark |
---|---|
Subject | Re: Incremental backups, and backup history |
Date | |
Msg-id | 87smq69n3b.fsf@stark.dyndns.tv Whole thread Raw |
In response to | Incremental backups, and backup history (Antonios Christofides <A.Christofides@itia.ntua.gr>) |
List | pgsql-general |
Antonios Christofides <A.Christofides@itia.ntua.gr> writes: > Is this filenames-instead-of-BLOBs for easier backup common practice? > Any other ideas or comments? This is a major point of contention. Some people think keeping all data in the database is a better approach, others think data that isn't inherently relational and doesn't need the protection of transactions doesn't really belong in the database. I happen to be firmly in the camp against putting such files in the database. But it depends a lot on what your needs are. In every case I've faced this it was simply useful to have the files accessible without piping them through the database protocol every time. I could point the web server at the directory and serve them up directly, or rsync them to the machines that could do that. They could even be served up from a separate lightweight web server without any database access altogether, which would have been impossible if the only way to access them was via the database. If you need to be able to update portions of your blobs, or if you need transactional safety then you may need the database. > My second question is a general relational database backup question, not > specifically related to pgsql. Sometimes a user accidentally > deletes/corrupts a file, and discovers it three days later. After they > come panicing to me, I can give them their file as it was three days > ago, because of the backup tape rotation. Now suppose a user deletes > ten employees from the database, and three days later they understand > that this was a bad thing. Now what? I can restore the entire database > and make it as it was three days ago, but I can't restore the particular > deleted records in the current database, as the relations make the > database behave as a single unit. What's worse is you may have several related changes to multiple tables. And then you won't know if any other changes to other tables were later done that depended on that data. There will be no way to do this perfectly in general. If your data is particularly amenable to this form of normalization then it can be useful though. For example, instead of storing counters that are incremented, even when the total is the only interesting statistic, I normally insert new records for every event. If ever it turns out something was wrong and the events should be disregarded for a particular time period, or with particular other properties, I have the complete history and can do that. Inserts are also more efficient for the database to do than updates. But for something like an employee table you're probably going to be stuck with restoring the table to a new name, and having a human look over the old data and the current data and updating the current data appropriately. You may want to consider having a "deleted" flag column and not ever deleting records. So at least undelete can be an easy operation that doesn't even need restoring backups. -- greg
pgsql-general by date: