Re: Incremental backups, and backup history - Mailing list pgsql-general

From Dennis Gearon
Subject Re: Incremental backups, and backup history
Date
Msg-id 3EF1D09B.9050203@cvc.net
Whole thread Raw
In response to Incremental backups, and backup history  (Antonios Christofides <A.Christofides@itia.ntua.gr>)
List pgsql-general
On your second question:

Keeping old data helps with data analysis, i.e., data mining. I would do the fired date as transactions. To see if an
employeeis still and employee, look for the latest transation, hired, rehired, contracted with as a temp/consultant,
fired,laid off, etc. 

Antonios Christofides wrote:

> Hi,
>
> I have two backup questions, not much related to each other; here they
> are.
>
> First: With PostgreSQL, I can't do incremental backups. pg_dump will
> dump the entire database. Thus, if I want to keep daily backups on tape,
> I'm in trouble because I'll have to do a full backup every day, which
> may need several hours and several tapes.
>
> One workaround I'm thinking is to not store BLOBs in the database, but
> store them in the filesystem and store the filenames in the database
> instead.  This needs some additional work (for example, a garbage
> collector to periodically delete unreferenced files), but will move a
> large amount of space from the database into the filesystem, which is
> capable of incremental backups. Only BLOBs, that is; for some tables
> that will have several tens of millions of small rows, I can't think of
> any workaround.
>
> Is this filenames-instead-of-BLOBs for easier backup common practice?
> Any other ideas or comments?
>
> My second question is a general relational database backup question, not
> specifically related to pgsql. Sometimes a user accidentally
> deletes/corrupts a file, and discovers it three days later. After they
> come panicing to me, I can give them their file as it was three days
> ago, because of the backup tape rotation. Now suppose a user deletes
> ten employees from the database, and three days later they understand
> that this was a bad thing. Now what? I can restore the entire database
> and make it as it was three days ago, but I can't restore the particular
> deleted records in the current database, as the relations make the
> database behave as a single unit.
>
> A colleague suggested, instead of updating or deleting rows, to only
> insert rows with a timestamp; for example, instead of updating the
> employee row with id=55, you insert a new row with id=55 with the
> updated data, and change the primary key to (id, dateinserted). You then
> always select the row with max dateinserted. A garbage collector is also
> needed to periodically delete obsolete rows older than, say, six months.
> Improvements can be made (such as using dateobsoleted instead of
> dateinserted or moving the old rows to another table), but even in the
> simplest cases I think it will be extremely hard to implement such a
> system, again because of the relations.
>
> So, it is a matter of database design? Do I have to design the database
> so that it keeps the history of what happened?
>
> Thanks everyone for the answers.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>


pgsql-general by date:

Previous
From: Jonathan Bartlett
Date:
Subject: Re: A creepy story about dates. How to prevent it?
Next
From: Dennis Gearon
Date:
Subject: Re: A creepy story about dates. How to prevent it?