Thread: Seeking datacenter PITR backup procedures [RESENDING]
Greetings,
We have several web applications with Pg 8.2.x running on isolated servers (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have been using nightly filesystem backup (stop pg, tar backup to ftp, start pg) and it worked well.
We would like to move to PITR backups since the database size will increase moving forward and our current backup method might increase server downtimes.
We have a central ftp backup server (yes, ftp :-) which we would like to use for weekly full and daily incremental PITR backups.
After reading the docs, PITR is still fuzzy. Our ideas for backup are (do not worry about the syntax),
** START **
tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"
Day 1:
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar
Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*
Day 3:
...
...
Day 7:
% rm -f $tmpwal/*
Start over
Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata
We have several web applications with Pg 8.2.x running on isolated servers (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have been using nightly filesystem backup (stop pg, tar backup to ftp, start pg) and it worked well.
We would like to move to PITR backups since the database size will increase moving forward and our current backup method might increase server downtimes.
We have a central ftp backup server (yes, ftp :-) which we would like to use for weekly full and daily incremental PITR backups.
After reading the docs, PITR is still fuzzy. Our ideas for backup are (do not worry about the syntax),
** START **
tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files before ftp
Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"
Day 1:
% psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata
% psql pg_stop_backup()
% ftp put pgdata.tar ftpserver:/server1/day1/pgdata
% ftp put $tmpwal/* ftpserver:/server1/day1/wal
% rm -f $tmpwal/* pgdata.tar
Day 2:
% ftp put $tmpwal/* ftpserver:/server1/day2/wal
% rm -f $tmpwal/*
Day 3:
...
...
Day 7:
% rm -f $tmpwal/*
Start over
Recovery on server1 (skeleton commands),
% rm -f $tmpwal/*
% mv pgdata pgdata.hosed
% ftp get ftpbackup:/server1/day1/pgdata
.tar .
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/* $tmpwal
% ftp get ftpbackup:/server1/day2/wal/* $tmpwal
.....
.....
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)
** END **
Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and hence wal logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups
We are not sure if WAL log filenames are unique and possibly overwrite older wal files during recovery.
I'm seeking suggestions from others with experience performing PostgreSQL PITR backups from multiple servers to a central backup server.
Thanks in advance,
Joey Krane
% tar -xvf pgdata.tar
% ftp get ftpbackup:/server1/day1/wal/* $tmpwal
% ftp get ftpbackup:/server1/day2/wal/* $tmpwal
.....
.....
% cp -r pgdata.hosed/pg_xlog pgdata/
% echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
% start pg (recovery begins)
** END **
Assumptions:
a. After pg_stop_backup(), Pg immediately recycles log files and hence wal logs can be copied to backup. This is a clean start.
b. New wal files since (a) are incremental backups
We are not sure if WAL log filenames are unique and possibly overwrite older wal files during recovery.
I'm seeking suggestions from others with experience performing PostgreSQL PITR backups from multiple servers to a central backup server.
Thanks in advance,
Joey Krane
Joey K. wrote: > After reading the docs, PITR is still fuzzy. Our ideas for backup are (do > not worry about the syntax), Maybe consider using LVM snapshots in conjunction with rdiff-backup as an option well. Simple to setup, simple to manage, no downtime, very space efficient. -- Best regards, Hannes Dorbath
"Joey K." <pguser@gmail.com> wrote: > > Greetings, I'm guessing you're in a hurry or in a pinch that you need to repost after one day on a weekend. I was waiting to let someone more knowledgeable answer, but I've had some experience with this, so I'll answer to the best of my ability. > We have several web applications with Pg 8.2.x running on isolated servers > (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have > been using nightly filesystem backup (stop pg, tar backup to ftp, start pg) > and it worked well. Any reason why you haven't been using pg_dump? There are a LOT of drawbacks to doing filesystem level backups. For example, you can't restore to disparate hardware (a filesystem backup made from PG on an i386 system won't work on an amd64 system, for example) > We would like to move to PITR backups since the database size will increase > moving forward and our current backup method might increase server > downtimes. How much do you expect it to increase? 2G is _tiny_ by modern standards. Even if you expect it to increase an order of magnitude, it's still a reasonable size for pg_dump. Some huge advantages to pg_dump: *) architecture-neutral dumps *) No need to stop the database *) Extremely simple procedure for backup and restore *) Human parseable backups (you may not even need to restore, just look through the data to see what was there in some cases) *) Can restore a database without shutting down a server, thus you can move a database from one server to another (for example) without affecting work occurring on the second server. > We have a central ftp backup server (yes, ftp :-) which we would like to use > for weekly full and daily incremental PITR backups. > > After reading the docs, PITR is still fuzzy. Our ideas for backup are (do > not worry about the syntax), > > ** START ** > > tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files > before ftp > Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f" > > Day 1: > % psql pg_start_backup(); tar pgdata.tar --exclude pg_xlog/ pgdata > % psql pg_stop_backup() > % ftp put pgdata.tar ftpserver:/server1/day1/pgdata > % ftp put $tmpwal/* ftpserver:/server1/day1/wal > % rm -f $tmpwal/* pgdata.tar > > Day 2: > % ftp put $tmpwal/* ftpserver:/server1/day2/wal > % rm -f $tmpwal/* > > Day 3: > ... > ... > > Day 7: > % rm -f $tmpwal/* > Start over > > Recovery on server1 (skeleton commands), > % rm -f $tmpwal/* Why are you removing this day's WAL logs before recovery? If the disaster occurs close to your backup time, this will result in the loss of an entire day's data changes. > % mv pgdata pgdata.hosed > % ftp get ftpbackup:/server1/day1/pgdata.tar . > % tar -xvf pgdata.tar > % ftp get ftpbackup:/server1/day1/wal/* $tmpwal > % ftp get ftpbackup:/server1/day2/wal/* $tmpwal > ..... > ..... > % cp -r pgdata.hosed/pg_xlog pgdata/ > % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf > % start pg (recovery begins) > > ** END ** Again, how much WAL traffic are you generating? Make sure you have enough free space on the recovery system to hold all of the WAL logs in the event you need to recover. Note that this procedure does not do a good job of protecting you from catastrophic hardware failure. In the event that your RAID system goes insane, you can lose as much as an entire day's worth of updates, and there's no reason to. Is there a reason you can't use (for example) NFS to transfer each WAL log to the backup system on the fly? This would better protect you from IO system failures, and reduce the amount of disk space you need on each server, since you don't need to keep WAL logs there. > Assumptions: > a. After pg_stop_backup(), Pg immediately recycles log files and hence wal > logs can be copied to backup. This is a clean start. I don't believe so. ARAIK, all pg_stop_backup() does is remove the marker that pg_start_backup() put in place to tell the recovery process when the filesystem backup started. By not backing up pg_xlog, you are going to be behind by however many transactions are in the most recent transaction log that has not yet been archived. Depending on how often your databases are updated, this is likely acceptable. If you need anything more timely than that, you'll probably want to implement Slony or some other replication system. > b. New wal files since (a) are incremental backups Yes. > We are not sure if WAL log filenames are unique and possibly overwrite older > wal files during recovery. They are unique. Eventually they will recycle, but if you look at the names and the number of possible names, you'll see that it will take a long time before the names recycle. > I'm seeking suggestions from others with experience performing PostgreSQL > PITR backups from multiple servers to a central backup server. Again, I recommend the use of NFS (or similar) for real-time backup to protect you from hardware failure. I _highly_ recommend you stage some disaster scenarios and actually use your procedure to restore some databases before you trust it. Getting PITR working effectively is tricky, no matter how many questions you ask of knowledgeable people. You're going to want to have first- hand experience going through the process. HTH. -- Bill Moran http://www.potentialtech.com
I'm guessing you're in a hurry or in a pinch that you need to repost
after one day on a weekend. I was waiting to let someone more
knowledgeable answer, but I've had some experience with this, so
I'll answer to the best of my ability.
I apologize. I wasn't sure if my first email ended up as spam since I saw other posts getting through. I'll be patient the next time ;-)
> We have several web applications with Pg 8.2.x running on isolated servers
> (~25). The database size on each machines (du -h pgdata) is ~2 GB. We have
> been using nightly filesystem backup (stop pg, tar backup to ftp, start pg)
> and it worked well.
Any reason why you haven't been using pg_dump? There are a LOT of
drawbacks to doing filesystem level backups. For example, you can't
restore to disparate hardware (a filesystem backup made from PG on
an i386 system won't work on an amd64 system, for example)
We have used pg_dump and like it. The drawback is that it is excruciatingly slow for backups. Although our databases are ~2GB currently, they will grow to 6~10 GB per database in the next 6 months and 25~30GB in a year.
Our hardware configuration is managed well and we do not see us switching architectures often. If we do, we will be performing a pg_dump/restore at the time.
> We would like to move to PITR backups since the database size will increase
> moving forward and our current backup method might increase server
> downtimes.
How much do you expect it to increase? 2G is _tiny_ by modern
standards. Even if you expect it to increase an order of magnitude,
it's still a reasonable size for pg_dump.
Some huge advantages to pg_dump:
*) architecture-neutral dumps
*) No need to stop the database
*) Extremely simple procedure for backup and restore
*) Human parseable backups (you may not even need to restore, just
look through the data to see what was there in some cases)
*) Can restore a database without shutting down a server, thus you
can move a database from one server to another (for example)
without affecting work occurring on the second server.
We have used pg_dump on an offline database. If pg_dump is performed on a running database, will the recovery be consistent?
>
> ** START **
>
> tmpwal = "/localhost/tmp" # tmp space on server 1 for storing wal files
> before ftp
> Configure $pgdata/postgresql.conf archive_command = "cp %p $tmpwal/%f"
>
.....
>
> Recovery on server1 (skeleton commands),
> % rm -f $tmpwal/*
Why are you removing this day's WAL logs before recovery? If the
disaster occurs close to your backup time, this will result in the
loss of an entire day's data changes.
Thanks for pointing this out.
> .....
> % cp -r pgdata.hosed/pg_xlog pgdata/
> % echo "cp $tmpwal/%f %p" > pgdata/recovery.conf
> % start pg (recovery begins)
>
> ** END **
Again, how much WAL traffic are you generating? Make sure you have
enough free space on the recovery system to hold all of the WAL logs
in the event you need to recover.
How do I measure the WAL traffic generated? Is this the size variance of pg_xlog per day? We plan our database size to diskspace ratio to be 1: 1.5 on the servers.
Note that this procedure does not do a good job of protecting you
from catastrophic hardware failure. In the event that your RAID
system goes insane, you can lose as much as an entire day's worth
of updates, and there's no reason to.
Currently, a day's worth of data loss is an accepted risk. Once we get our infrastructure upgraded (moving away from FTP to NFS for these servers) it will make our backups near realtime.
I _highly_ recommend you stage some disaster scenarios and actually
use your procedure to restore some databases before you trust it.
Getting PITR working effectively is tricky, no matter how many questions
you ask of knowledgeable people. You're going to want to have first-
hand experience going through the process.
Absolutely. We will run tests to ensure the process works. Sharing experience implementing PITR in an environment like ours might help avoid some initial pitfalls.
Joey K
On Aug 19, 2007, at 7:23 AM, Bill Moran wrote: >> Assumptions: >> a. After pg_stop_backup(), Pg immediately recycles log files and >> hence wal >> logs can be copied to backup. This is a clean start. > > I don't believe so. ARAIK, all pg_stop_backup() does is remove the > marker that pg_start_backup() put in place to tell the recovery > process > when the filesystem backup started. I'm pretty certain that's not the case. For a PITR to ensure that data is back to a consistent state after a recovery, it has to replay all the transactions that took place between pg_start_backup and pg_stop_backup; so it needs to know when pg_stop_backup() was actually run. > By not backing up pg_xlog, you are > going to be behind by however many transactions are in the most recent > transaction log that has not yet been archived. Depending on how > often > your databases are updated, this is likely acceptable. If you need > anything more timely than that, you'll probably want to implement > Slony or some other replication system. Just keep in mind that Slony is *not* a backup solution (though you could possibly argue that it's log shipping is). -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Decibel! <decibel@decibel.org> wrote: > > On Aug 19, 2007, at 7:23 AM, Bill Moran wrote: > >> Assumptions: > >> a. After pg_stop_backup(), Pg immediately recycles log files and > >> hence wal > >> logs can be copied to backup. This is a clean start. > > > > I don't believe so. ARAIK, all pg_stop_backup() does is remove the > > marker that pg_start_backup() put in place to tell the recovery > > process > > when the filesystem backup started. > > I'm pretty certain that's not the case. For a PITR to ensure that > data is back to a consistent state after a recovery, it has to replay > all the transactions that took place between pg_start_backup and > pg_stop_backup; so it needs to know when pg_stop_backup() was > actually run. Sounds likely ... but I don't believe that forces any specific log cycling activity, like the OP suggested. Be nice if someone who knew for sure would chime in ;) > > By not backing up pg_xlog, you are > > going to be behind by however many transactions are in the most recent > > transaction log that has not yet been archived. Depending on how > > often > > your databases are updated, this is likely acceptable. If you need > > anything more timely than that, you'll probably want to implement > > Slony or some other replication system. > > Just keep in mind that Slony is *not* a backup solution (though you > could possibly argue that it's log shipping is). True. This rides the fine line of the difference between an HA setup and backup. Specifically: HA won't allow you to recovery from user error. -- Bill Moran http://www.potentialtech.com
On Thu, Aug 23, 2007 at 06:58:36PM -0400, Bill Moran wrote: > Decibel! <decibel@decibel.org> wrote: > > > > On Aug 19, 2007, at 7:23 AM, Bill Moran wrote: > > >> Assumptions: > > >> a. After pg_stop_backup(), Pg immediately recycles log files and > > >> hence wal > > >> logs can be copied to backup. This is a clean start. > > > > > > I don't believe so. ARAIK, all pg_stop_backup() does is remove the > > > marker that pg_start_backup() put in place to tell the recovery > > > process > > > when the filesystem backup started. > > > > I'm pretty certain that's not the case. For a PITR to ensure that > > data is back to a consistent state after a recovery, it has to replay > > all the transactions that took place between pg_start_backup and > > pg_stop_backup; so it needs to know when pg_stop_backup() was > > actually run. > > Sounds likely ... but I don't believe that forces any specific log > cycling activity, like the OP suggested. > > Be nice if someone who knew for sure would chime in ;) Oh, that one's easy... it was changed in 8.2. Previously, you had to either manually copy the active WAL file or wait for it to roll over before you had a valid PITR backup. In 8.2, pg_stop_backup forces WAL rotation (but note that you still have to wait for the archive command to complete before the backup is valid). -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)