Thread: tape backups
Hi everybody, I'm trying to find a good solution to making backups to tape, where I want to define "good" as: - easy to use, like pg_dumpall, BUT - not in a single file, so I don't backup my entire database cluster with every differential backup As I understand my backup program (Bacula) if a file changes at all between differential backups then it gets backed up again in its entirety. That seems pretty reasonable. So now I'm trying to figure out how to get my postgres dump to end up in files in such a way that little change in data means few file changes. But if there's no native tool to do that (and it seems like there isn't) then setting up something like that sounds like it might be a pain, as would restoring from it. Am I going about this the wrong way? Would it just be easier to do a full pg_dumpall for my full backups and then build up a list of WAL files with each differential? How do other people do it?
I think you might want to do incremental backups so a better approach to that as you mentioned too will be WAL files. For details you can refer to --> http://www.postgresql.org/docs/current/static/continuous-archiving.html
--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
On 12/23/06, Ben <bench@silentmedia.com> wrote:
Hi everybody,
I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:
- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup
As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.
Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/
Thanks for the pointer. This does look like what I want, because in retrospect I don't know how I would know which wal logs I would start to replay after a given pg_dumpall to bring myself up to the present after a recovery.
But, this page confuses me when it talks about pg_start_backup and pg_stop_backup. What do these functions do? It seems like they do nothing more than let me know which wal files were in use over the duration of the backup, which is certainly useful. But they do NOT seem to freeze the actual data files, and it seems to me that because the data files won't be archived atomically while they may be changing, that I might end up with corrupted data files that a replay of wal files wouldn't correct. Is my fear groundless?
On Dec 23, 2006, at 10:20 AM, Shoaib Mir wrote:
I think you might want to do incremental backups so a better approach to that as you mentioned too will be WAL files. For details you can refer to --> http://www.postgresql.org/docs/current/static/continuous-archiving.html
--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)On 12/23/06, Ben <bench@silentmedia.com> wrote:Hi everybody,
I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:
- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup
As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.
Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/
Ben <bench@silentmedia.com> writes: > But, this page confuses me when it talks about pg_start_backup and > pg_stop_backup. What do these functions do? It seems like they do > nothing more than let me know which wal files were in use over the > duration of the backup, which is certainly useful. But they do NOT > seem to freeze the actual data files, and it seems to me that because > the data files won't be archived atomically while they may be > changing, that I might end up with corrupted data files that a replay > of wal files wouldn't correct. Is my fear groundless? Yes. The reason we don't have to freeze the data files during a backup is that any page that changes within that interval will be rewritten anyway when the WAL log is replayed during recovery. This is why the WAL sequence has to start before the pg_start_backup rather than at some later point --- that overlap is exactly what makes it safe to not freeze the data files. regards, tom lane
Ah, got it. Thanks! On Dec 23, 2006, at 5:59 PM, Tom Lane wrote: > Ben <bench@silentmedia.com> writes: >> But, this page confuses me when it talks about pg_start_backup and >> pg_stop_backup. What do these functions do? It seems like they do >> nothing more than let me know which wal files were in use over the >> duration of the backup, which is certainly useful. But they do NOT >> seem to freeze the actual data files, and it seems to me that because >> the data files won't be archived atomically while they may be >> changing, that I might end up with corrupted data files that a replay >> of wal files wouldn't correct. Is my fear groundless? > > Yes. The reason we don't have to freeze the data files during a > backup > is that any page that changes within that interval will be rewritten > anyway when the WAL log is replayed during recovery. This is why the > WAL sequence has to start before the pg_start_backup rather than at > some > later point --- that overlap is exactly what makes it safe to not > freeze > the data files. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match