Thread: tape backups

tape backups

From
Ben
Date:
Hi everybody,

I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:

- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup

As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.

Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?

Re: tape backups

From
"Shoaib Mir"
Date:
I think you might want to do incremental backups so a better approach to that as you mentioned too will be WAL files. For details you can refer to --> http://www.postgresql.org/docs/current/static/continuous-archiving.html

--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/23/06, Ben <bench@silentmedia.com> wrote:
Hi everybody,

I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:

- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup

As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.

Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/

Re: tape backups

From
Ben
Date:
Thanks for the pointer. This does look like what I want, because in retrospect I don't know how I would know which wal logs I would start to replay after a given pg_dumpall to bring myself up to the present after a recovery. 

But, this page confuses me when it talks about pg_start_backup and pg_stop_backup. What do these functions do? It seems like they do nothing more than let me know which wal files were in use over the duration of the backup, which is certainly useful. But they do NOT seem to freeze the actual data files, and it seems to me that because the data files won't be archived atomically while they may be changing, that I might end up with corrupted data files that a replay of wal files wouldn't correct. Is my fear groundless?

On Dec 23, 2006, at 10:20 AM, Shoaib Mir wrote:

I think you might want to do incremental backups so a better approach to that as you mentioned too will be WAL files. For details you can refer to --> http://www.postgresql.org/docs/current/static/continuous-archiving.html

--------------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/23/06, Ben <bench@silentmedia.com> wrote:
Hi everybody,

I'm trying to find a good solution to making backups to tape, where I
want to define "good" as:

- easy to use, like pg_dumpall, BUT
- not in a single file, so I don't backup my entire database cluster
with every differential backup

As I understand my backup program (Bacula) if a file changes at all
between differential backups then it gets backed up again in its
entirety. That seems pretty reasonable. So now I'm trying to figure
out how to get my postgres dump to end up in files in such a way that
little change in data means few file changes. But if there's no
native tool to do that (and it seems like there isn't) then setting
up something like that sounds like it might be a pain, as would
restoring from it.

Am I going about this the wrong way? Would it just be easier to do a
full pg_dumpall for my full backups and then build up a list of WAL
files with each differential? How do other people do it?

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org/


Re: tape backups

From
Tom Lane
Date:
Ben <bench@silentmedia.com> writes:
> But, this page confuses me when it talks about pg_start_backup and
> pg_stop_backup. What do these functions do? It seems like they do
> nothing more than let me know which wal files were in use over the
> duration of the backup, which is certainly useful. But they do NOT
> seem to freeze the actual data files, and it seems to me that because
> the data files won't be archived atomically while they may be
> changing, that I might end up with corrupted data files that a replay
> of wal files wouldn't correct. Is my fear groundless?

Yes.  The reason we don't have to freeze the data files during a backup
is that any page that changes within that interval will be rewritten
anyway when the WAL log is replayed during recovery.  This is why the
WAL sequence has to start before the pg_start_backup rather than at some
later point --- that overlap is exactly what makes it safe to not freeze
the data files.

            regards, tom lane

Re: tape backups

From
Ben
Date:
Ah, got it. Thanks!

On Dec 23, 2006, at 5:59 PM, Tom Lane wrote:

> Ben <bench@silentmedia.com> writes:
>> But, this page confuses me when it talks about pg_start_backup and
>> pg_stop_backup. What do these functions do? It seems like they do
>> nothing more than let me know which wal files were in use over the
>> duration of the backup, which is certainly useful. But they do NOT
>> seem to freeze the actual data files, and it seems to me that because
>> the data files won't be archived atomically while they may be
>> changing, that I might end up with corrupted data files that a replay
>> of wal files wouldn't correct. Is my fear groundless?
>
> Yes.  The reason we don't have to freeze the data files during a
> backup
> is that any page that changes within that interval will be rewritten
> anyway when the WAL log is replayed during recovery.  This is why the
> WAL sequence has to start before the pg_start_backup rather than at
> some
> later point --- that overlap is exactly what makes it safe to not
> freeze
> the data files.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match