Thread: Seeking datacenter PITR backup suggestions
Greetings,
We have several web applications with Pg 8.2.x running on isolated servers (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have been using nightly filesystem backup (stop pg, tar backup to ftp, start pg) and it worked well.
We would like to move to PITR backups since the database size will increase moving forward and our current backup method might increase server downtimes.
We have a central ftp backup server (yes, ftp :-) which we would like to use for weekly full and daily incremental PITR backups.
After reading the docs, PITR is still fuzzy. Our ideas for backup are (do not worry about the syntax),
** START **
tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"
Day 1:
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar
Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*
Day 3:
...
...
Day 7:
% rm -f $tmpwal/*
Start over
Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata.tar .
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/* $tmpwal
% ftp get ftpbackup:/server1/day2/wal/* $tmpwal
.....
.....
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)
** END **
Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and hence wal logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups
We are not sure if WAL log filenames are unique and possibly overwrite older wal files during recovery.
I'm seeking suggestions from others with experience performing PostgreSQL PITR backups from multiple servers to a central backup server.
Thanks in advance,
Joey Krane
On Aug 17, 2007, at 5:48 PM, Joey K. wrote: > We have several web applications with Pg 8.2.x running on isolated > servers (~25). The database size on each machines (du -h pgdata) is > ~2 GB. We have been using nightly filesystem backup (stop pg, tar > backup to ftp, start pg) and it worked well. > > We would like to move to PITR backups since the database size will > increase moving forward and our current backup method might > increase server downtimes. > > We have a central ftp backup server (yes, ftp :-) which we would > like to use for weekly full and daily incremental PITR backups. > > After reading the docs, PITR is still fuzzy. Our ideas for backup > are (do not worry about the syntax), > > ** START ** > > tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal > files before ftp > Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f" Why not just FTP WAL files directly? > Day 1: > % psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata > % psql pg_stop_backup() > % ftp put pgdata.tar ftpserver:/server1/day1/pgdata > % ftp put $tmpwal/* ftpserver:/server1/day1/wal > % rm -f $tmpwal/* pgdata.tar The last 2 are a race condition... you could easily lose a WAL file that way. Keep in mind that that pgdata.tar is 100% useless unless you also have the WAL files that were created during the backup. I generally recommend to folks that they keep two base copies around for that reason. > Day 2: > % ftp put $tmpwal/* ftpserver:/server1/day2/wal > % rm -f $tmpwal/* > > Day 3: > ... > ... > > Day 7: > % rm -f $tmpwal/* > Start over > > Recovery on server1 (skeleton commands), > % rm -f $tmpwal/* > % mv pgdata pgdata.hosed > % ftp get ftpbackup:/server1/day1/pgdata.tar . > % tar -xvf pgdata.tar > % ftp get ftpbackup:/server1/day1/wal/* $tmpwal > % ftp get ftpbackup:/server1/day2/wal/* $tmpwal > ..... > ..... > % cp -r pgdata.hosed/pg_xlog pgdata/ > % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf > % start pg (recovery begins) > > ** END ** > > Assumptions: > a. After pg_stop_backup(), Pg immediately recycles log files and > hence wal logs can be copied to backup. This is a clean start. > b. New wal files since (a) are incremental backups > > We are not sure if WAL log filenames are unique and possibly > overwrite older wal files during recovery. > > I'm seeking suggestions from others with experience performing > PostgreSQL PITR backups from multiple servers to a central backup > server. In general, your handling of WAL files seems fragile and error-prone. I think it would make far more sense to just FTP them directly, and not try and get fancy with different directories for different days. *when* a WAL file was generated is meaningless until you compare it to a base backup to see if that WAL file is required for the base backup, useful (but not required) to the base backup, or useless for the base backup. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> In general, your handling of WAL files seems fragile and error-prone.... Indeed. I would recommend simply using rsync to handle pushing the files. I see several advantages: 1. Distributed load - you aren't copying a full-day of files all at once. 2. Very easy to set-up - you can use it directly as your archive_command if you wish. 3. Atomic. Rsync copies new data to a temporary location that will only be moved into place when the transfer is complete. The destination server will never see a partial file. Depending on the FTP client/server combo, you will likely end up with a partial file in the event of communication failure. 4. Much more up-to-the-minute recovery data. In your scenario, what about using "cp -l" (or "ln") instead? Since the hard-link it is only creating a new pointer, it will be very fast and save a bunch of disk IO on your server and it doesn't appear that the tempdir is for much other than organizing purposes anyway. I'm setting up some test machines to learn more about PITR and warm backups and am considering a two-stage process using "cp -l" to add the file to the list needing transfer and regular rsync to actually move the files to the destination machine. (The destination machine will be over a WAN link so I'd like to avoid having PG tied up waiting for each rsync to complete.) Cheers, Steve
"Steve Crawford" <scrawford@pinpointresearch.com> writes: > 4. Much more up-to-the-minute recovery data. > > In your scenario, what about using "cp -l" (or "ln") instead? Since the > hard-link it is only creating a new pointer, it will be very fast and > save a bunch of disk IO on your server and it doesn't appear that the > tempdir is for much other than organizing purposes anyway. Postgres tries to reuse WAL files. Once the archive_command completes it believes it is safe to reuse the old file without deleting it. That will do nasty things if you've used ln as your archive command. > I'm setting up some test machines to learn more about PITR and warm > backups and am considering a two-stage process using "cp -l" to add the > file to the list needing transfer and regular rsync to actually move the > files to the destination machine. (The destination machine will be over > a WAN link so I'd like to avoid having PG tied up waiting for each rsync > to complete.) In theory the only thing that the archive command blocks is the reuse of WAL log files. So as long as the command is running if Postgres has used up all its existing WAL files it will have to create new ones which does have some performance cost. But otherwise it's unaffected. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: >> In your scenario, what about using "cp -l" (or "ln") instead?.... > > Postgres tries to reuse WAL files. Once the archive_command completes it > believes it is safe to reuse the old file without deleting it. That will do > nasty things if you've used ln as your archive command. Um, OK. I won't try that. Thanks. That really could be nasty as I imagine that it would cause either corruption or failures on the standby server that might be very hard to track down. > In theory the only thing that the archive command blocks is the reuse of WAL > log files.... What happens if PG server crashes during a WAL transfer? Does it still know the file needs to be transferred or is doing a fast copy to a local (same machine or machine on local network) from which files are transferred over a slow-link a safer process? Also, in the (rare) case that PG needs to be restarted, will the restart block while waiting for all current log transfers to complete? Cheers, Steve
On Tue, Aug 28, 2007 at 11:05:52AM -0700, Steve Crawford wrote: > In your scenario, what about using "cp -l" (or "ln") instead? Since the > hard-link it is only creating a new pointer, it will be very fast and > save a bunch of disk IO on your server and it doesn't appear that the > tempdir is for much other than organizing purposes anyway. Note that that will only work if you're creating the link on the same filesystem, and having /tmp and your data in the same filesystem isn't such a hot idea. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote: > "Steve Crawford" <scrawford@pinpointresearch.com> writes: > > > 4. Much more up-to-the-minute recovery data. > > > > In your scenario, what about using "cp -l" (or "ln") instead? Since the > > hard-link it is only creating a new pointer, it will be very fast and > > save a bunch of disk IO on your server and it doesn't appear that the > > tempdir is for much other than organizing purposes anyway. > > Postgres tries to reuse WAL files. Once the archive_command completes it > believes it is safe to reuse the old file without deleting it. That will do > nasty things if you've used ln as your archive command. I thought that was specifically disabled when PITR was enabled? Or do we just do a rename rather than an unlink ond creating a new file? -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
Decibel! <decibel@decibel.org> writes: > On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote: >> Postgres tries to reuse WAL files. Once the archive_command completes it >> believes it is safe to reuse the old file without deleting it. That will do >> nasty things if you've used ln as your archive command. > I thought that was specifically disabled when PITR was enabled? Or do we > just do a rename rather than an unlink ond creating a new file? No. The only difference is we don't recycle the file until the archive_command says it's done with it. The archive_command must actually physically copy the data someplace else, and must not return success until it's sure the copy is good. Perhaps the docs are not sufficiently clear on the point? regards, tom lane
On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote: > Decibel! <decibel@decibel.org> writes: > > On Tue, Aug 28, 2007 at 08:31:10PM +0100, Gregory Stark wrote: > >> Postgres tries to reuse WAL files. Once the archive_command completes it > >> believes it is safe to reuse the old file without deleting it. That will do > >> nasty things if you've used ln as your archive command. > > > I thought that was specifically disabled when PITR was enabled? Or do we > > just do a rename rather than an unlink ond creating a new file? > > No. The only difference is we don't recycle the file until the > archive_command says it's done with it. > > The archive_command must actually physically copy the data someplace > else, and must not return success until it's sure the copy is good. > Perhaps the docs are not sufficiently clear on the point? Yeah... I think that's a big gotcha waiting to smack someone. I'd actually make the mention <strong> so that hopefully no one can miss it... or do we have an official method for putting warnings in the docs? "Because WAL segment files are renamed and not re-created from scratch, it is critical that the archive command actually copy files, not move or hard-link them." -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Attachment
On Tue, 2007-08-28 at 21:04 -0500, Decibel! wrote: > On Tue, Aug 28, 2007 at 09:54:23PM -0400, Tom Lane wrote: > > Perhaps the docs are not sufficiently clear on the point? > > Yeah... I think that's a big gotcha waiting to smack someone. I'd > actually make the mention <strong> so that hopefully no one can miss > it... or do we have an official method for putting warnings in the docs? > > "Because WAL segment files are renamed and not re-created from scratch, > it is critical that the archive command actually copy files, not move > or hard-link them." I'll shortly be writing a doc patch to clarify a few points and to explain new possibilities, such as Koichi Suzuki's work. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com