Re: Incremental backups, and backup history - Mailing list pgsql-general
From | Dennis Gearon |
---|---|
Subject | Re: Incremental backups, and backup history |
Date | |
Msg-id | 3EF1D09B.9050203@cvc.net Whole thread Raw |
In response to | Incremental backups, and backup history (Antonios Christofides <A.Christofides@itia.ntua.gr>) |
List | pgsql-general |
On your second question: Keeping old data helps with data analysis, i.e., data mining. I would do the fired date as transactions. To see if an employeeis still and employee, look for the latest transation, hired, rehired, contracted with as a temp/consultant, fired,laid off, etc. Antonios Christofides wrote: > Hi, > > I have two backup questions, not much related to each other; here they > are. > > First: With PostgreSQL, I can't do incremental backups. pg_dump will > dump the entire database. Thus, if I want to keep daily backups on tape, > I'm in trouble because I'll have to do a full backup every day, which > may need several hours and several tapes. > > One workaround I'm thinking is to not store BLOBs in the database, but > store them in the filesystem and store the filenames in the database > instead. This needs some additional work (for example, a garbage > collector to periodically delete unreferenced files), but will move a > large amount of space from the database into the filesystem, which is > capable of incremental backups. Only BLOBs, that is; for some tables > that will have several tens of millions of small rows, I can't think of > any workaround. > > Is this filenames-instead-of-BLOBs for easier backup common practice? > Any other ideas or comments? > > My second question is a general relational database backup question, not > specifically related to pgsql. Sometimes a user accidentally > deletes/corrupts a file, and discovers it three days later. After they > come panicing to me, I can give them their file as it was three days > ago, because of the backup tape rotation. Now suppose a user deletes > ten employees from the database, and three days later they understand > that this was a bad thing. Now what? I can restore the entire database > and make it as it was three days ago, but I can't restore the particular > deleted records in the current database, as the relations make the > database behave as a single unit. > > A colleague suggested, instead of updating or deleting rows, to only > insert rows with a timestamp; for example, instead of updating the > employee row with id=55, you insert a new row with id=55 with the > updated data, and change the primary key to (id, dateinserted). You then > always select the row with max dateinserted. A garbage collector is also > needed to periodically delete obsolete rows older than, say, six months. > Improvements can be made (such as using dateobsoleted instead of > dateinserted or moving the old rows to another table), but even in the > simplest cases I think it will be extremely hard to implement such a > system, again because of the relations. > > So, it is a matter of database design? Do I have to design the database > so that it keeps the history of what happened? > > Thanks everyone for the answers. > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match >
pgsql-general by date: