Thread: Seeking datacenter PITR backup suggestions

Seeking datacenter PITR backup suggestions

From
"Joey K."
Date:

Greetings,

We have several web applications with Pg 8.2.x running on isolated servers (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have been using nightly filesystem backup (stop pg, tar backup to ftp, start pg) and it worked well.

We would like to move to PITR backups since the database size will increase moving forward and our current backup method might increase server downtimes.

We have a central ftp backup server (yes, ftp :-) which we would like to use for weekly full and daily incremental PITR backups.

After reading the docs, PITR is still fuzzy. Our ideas for backup are (do not worry about the syntax),

** START **

tmpwal = "/localhost/tmp"   # tmp space on server 1 for storing wal files before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"

Day 1: 
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar

Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*

Day 3:
...
...

Day 7:
% rm -f $tmpwal/*
Start over

Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata.tar  .
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/*  $tmpwal
% ftp get ftpbackup:/server1/day2/wal/*  $tmpwal
.....
.....
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)

** END **

Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and hence wal logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups

We are not sure if WAL log filenames are unique and possibly overwrite older wal files during recovery.

I'm seeking suggestions from others with experience performing PostgreSQL PITR backups from multiple servers to a central backup server.

Thanks in advance,
Joey Krane

Re: Seeking datacenter PITR backup suggestions

From
Decibel!
Date:
On Aug 17, 2007, at 5:48 PM, Joey K. wrote:
> We have several web applications with Pg 8.2.x running on isolated
> servers (~25). The database size on each machines (du -h pgdata) is
> ~2 GB. We have been using nightly filesystem backup (stop pg, tar
> backup to ftp, start pg) and it worked well.
>
> We would like to move to PITR backups since the database size will
> increase moving forward and our current backup method might
> increase server downtimes.
>
> We have a central ftp backup server (yes, ftp :-) which we would
> like to use for weekly full and daily incremental PITR backups.
>
> After reading the docs, PITR is still fuzzy. Our ideas for backup
> are (do not worry about the syntax),
>
> ** START **
>
> tmpwal = "/localhost/tmp"   # tmp space on server 1 for storing wal
> files before ftp
> Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"

Why not just FTP WAL files directly?

> Day 1:
> % psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
> % psql pg_stop_backup()
> % ftp put pgdata.tar ftpserver:/server1/day1/pgdata
> % ftp put $tmpwal/* ftpserver:/server1/day1/wal
> % rm -f $tmpwal/* pgdata.tar

The last 2 are a race condition... you could easily lose a WAL file
that way.

Keep in mind that that pgdata.tar is 100% useless unless you also
have the WAL files that were created during the backup. I generally
recommend to folks that they keep two base copies around for that
reason.

> Day 2:
> % ftp put $tmpwal/* ftpserver:/server1/day2/wal
> % rm -f $tmpwal/*
>
> Day 3:
> ...
> ...
>
> Day 7:
> % rm -f $tmpwal/*
> Start over
>
> Recovery on server1 (skeleton commands),
> % rm -f $tmpwal/*
> % mv pgdata pgdata.hosed
> % ftp get ftpbackup:/server1/day1/pgdata.tar  .
> % tar -xvf pgdata.tar
> % ftp get ftpbackup:/server1/day1/wal/*  $tmpwal
> % ftp get ftpbackup:/server1/day2/wal/*  $tmpwal
> .....
> .....
> % cp -r pgdata.hosed/pg_xlog pgdata/
> % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
> % start pg (recovery begins)
>
> ** END **
>
> Assumptions:
> a. After pg_stop_backup(), Pg immediately recycles log files and
> hence wal logs can be copied to backup. This is a clean start.
> b. New wal files since (a) are incremental backups
>
> We are not sure if WAL log filenames are unique and possibly
> overwrite older wal files during recovery.
>
> I'm seeking suggestions from others with experience performing
> PostgreSQL PITR backups from multiple servers to a central backup
> server.

In general, your handling of WAL files seems fragile and error-prone.
I think it would make far more sense to just FTP them directly, and
not try and get fancy with different directories for different days.
*when* a WAL file was generated is meaningless until you compare it
to a base backup to see if that WAL file is required for the base
backup, useful (but not required) to the base backup, or useless for
the base backup.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Seeking datacenter PITR backup suggestions

From
Steve Crawford
Date:
> In general, your handling of WAL files seems fragile and error-prone....

Indeed. I would recommend simply using rsync to handle pushing the
files. I see several advantages:

1. Distributed load - you aren't copying a full-day of files all at once.

2. Very easy to set-up - you can use it directly as your archive_command
if you wish.

3. Atomic. Rsync copies new data to a temporary location that will only
be moved into place when the transfer is complete. The destination
server will never see a partial file. Depending on the FTP client/server
combo, you will likely end up with a partial file in the event of
communication failure.

4. Much more up-to-the-minute recovery data.

In your scenario, what about using "cp -l" (or "ln") instead? Since the
hard-link it is only creating a new pointer, it will be very fast and
save a bunch of disk IO on your server and it doesn't appear that the
tempdir is for much other than organizing purposes anyway.

I'm setting up some test machines to learn more about PITR and warm
backups and am considering a two-stage process using "cp -l" to add the
file to the list needing transfer and regular rsync to actually move the
files to the destination machine. (The destination machine will be over
a WAN link so I'd like to avoid having PG tied up waiting for each rsync
to complete.)

Cheers,
Steve


Re: Seeking datacenter PITR backup suggestions

From
Gregory Stark
Date:
"Steve Crawford" <scrawford@pinpointresearch.com> writes:

> 4. Much more up-to-the-minute recovery data.
>
> In your scenario, what about using "cp -l" (or "ln") instead? Since the
> hard-link it is only creating a new pointer, it will be very fast and
> save a bunch of disk IO on your server and it doesn't appear that the
> tempdir is for much other than organizing purposes anyway.

Postgres tries to reuse WAL files. Once the archive_command completes it
believes it is safe to reuse the old file without deleting it. That will do
nasty things if you've used ln as your archive command.


> I'm setting up some test machines to learn more about PITR and warm
> backups and am considering a two-stage process using "cp -l" to add the
> file to the list needing transfer and regular rsync to actually move the
> files to the destination machine. (The destination machine will be over
> a WAN link so I'd like to avoid having PG tied up waiting for each rsync
> to complete.)

In theory the only thing that the archive command blocks is the reuse of WAL
log files. So as long as the command is running if Postgres has used up all
its existing WAL files it will have to create new ones which does have some
performance cost. But otherwise it's unaffected.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Seeking datacenter PITR backup suggestions

From
Steve Crawford
Date:
Gregory Stark wrote:

>> In your scenario, what about using "cp -l" (or "ln") instead?....
>
> Postgres tries to reuse WAL files. Once the archive_command completes it
> believes it is safe to reuse the old file without deleting it. That will do
> nasty things if you've used ln as your archive command.

Um, OK. I won't try that. Thanks. That really could be nasty as I
imagine that it would cause either corruption or failures on the standby
server that might be very hard to track down.

> In theory the only thing that the archive command blocks is the reuse of WAL
> log files....

What happens if PG server crashes during a WAL transfer? Does it still
know the file needs to be transferred or is doing a fast copy to a local
(same machine or machine on local network) from which files are
transferred over a slow-link a safer process?

Also, in the (rare) case that PG needs to be restarted, will the restart
block while waiting for all current log transfers to complete?

Cheers,
Steve


Re: Seeking datacenter PITR backup suggestions

From
Decibel!
Date:
On Tue, Aug 28, 2007 at 11:05:52AM -0700, Steve Crawford wrote:
> In your scenario, what about using "cp -l" (or "ln") instead? Since the
> hard-link it is only creating a new pointer, it will be very fast and
> save a bunch of disk IO on your server and it doesn't appear that the
> tempdir is for much other than organizing purposes anyway.

Note that that will only work if you're creating the link on the same
filesystem, and having /tmp and your data in the same filesystem isn't
such a hot idea.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Seeking datacenter PITR backup suggestions

From
Decibel!
Date:
On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:
> "Steve Crawford" <scrawford@pinpointresearch.com> writes:
>
> > 4. Much more up-to-the-minute recovery data.
> >
> > In your scenario, what about using "cp -l" (or "ln") instead? Since the
> > hard-link it is only creating a new pointer, it will be very fast and
> > save a bunch of disk IO on your server and it doesn't appear that the
> > tempdir is for much other than organizing purposes anyway.
>
> Postgres tries to reuse WAL files. Once the archive_command completes it
> believes it is safe to reuse the old file without deleting it. That will do
> nasty things if you've used ln as your archive command.

I thought that was specifically disabled when PITR was enabled? Or do we
just do a rename rather than an unlink ond creating a new file?
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Seeking datacenter PITR backup suggestions

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:
>> Postgres tries to reuse WAL files. Once the archive_command completes it
>> believes it is safe to reuse the old file without deleting it. That will do
>> nasty things if you've used ln as your archive command.

> I thought that was specifically disabled when PITR was enabled? Or do we
> just do a rename rather than an unlink ond creating a new file?

No.  The only difference is we don't recycle the file until the
archive_command says it's done with it.

The archive_command must actually physically copy the data someplace
else, and must not return success until it's sure the copy is good.
Perhaps the docs are not sufficiently clear on the point?

            regards, tom lane

Re: Seeking datacenter PITR backup suggestions

From
Decibel!
Date:
On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote:
> Decibel! <decibel@decibel.org> writes:
> > On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote:
> >> Postgres tries to reuse WAL files. Once the archive_command completes it
> >> believes it is safe to reuse the old file without deleting it. That will do
> >> nasty things if you've used ln as your archive command.
>
> > I thought that was specifically disabled when PITR was enabled? Or do we
> > just do a rename rather than an unlink ond creating a new file?
>
> No.  The only difference is we don't recycle the file until the
> archive_command says it's done with it.
>
> The archive_command must actually physically copy the data someplace
> else, and must not return success until it's sure the copy is good.
> Perhaps the docs are not sufficiently clear on the point?

Yeah... I think that's a big gotcha waiting to smack someone. I'd
actually make the mention <strong> so that hopefully no one can miss
it... or do we have an official method for putting warnings in the docs?

"Because WAL segment files are renamed and not re-created from scratch,
it is critical that the archive command actually copy files, not move
or hard-link them."
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Attachment

Re: Seeking datacenter PITR backup suggestions

From
Simon Riggs
Date:
On Tue, 2007-08-28 at 21:04 -0500, Decibel! wrote:
> On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote:

> > Perhaps the docs are not sufficiently clear on the point?
>
> Yeah... I think that's a big gotcha waiting to smack someone. I'd
> actually make the mention <strong> so that hopefully no one can miss
> it... or do we have an official method for putting warnings in the docs?
>
> "Because WAL segment files are renamed and not re-created from scratch,
> it is critical that the archive command actually copy files, not move
> or hard-link them."

I'll shortly be writing a doc patch to clarify a few points and to
explain new possibilities, such as Koichi Suzuki's work.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com