Thread: Backup Strategy Second Opinion
Hey guys, we just moved our system to Amazon's EC2 service. I'm a bit paranoid about backups, and this environment is very different than our previous environment. I was hoping you guys could point out any major flaws in our backup strategy that I may have missed. A few assumptions: 1. It's OK if we lose a few seconds (or even minutes) of transactions should one of our primary databases crash. 2. It's unlikely we'll need to load a backup that's more than a few days old. Here's what we're currently doing: Primary database ships WAL files to S3. Snapshot primary database to tar file. Upload tar file to S3. Create secondary database from tar file on S3. Put secondary database into continuous recovery mode, pulling wal files from S3. Every night on secondary database: * shutdown postgres * unmount ebs volume that contains postgres data * create new snapshot of ebs volume * remount ebs volume * restart postgres I manually delete older log files and snapshots once I've verified that a newer snapshot can be brought up as an active database and have run a few tests on it. Other than that, we have some miscellaneous monitoring to keep track of the # of logs files in the pg_xlog directory and the amount of available disk space on all the servers. Ideally, if the # of log files starts to grow beyond a certain threshold, that indicates something went wrong with the log shipping and we'll investigate to see what the problem is. I think this is a pretty good strategy, but I've been so caught up in this I may not be seeing the forest through the trees so I thought I'd ask for a sanity check here. Thanks, Bryan
1. It's OK if we lose a few seconds (or even minutes) of transactions
should one of our primary databases crash.
2. It's unlikely we'll need to load a backup that's more than a few days old.
How do you handle failover and falling back to the primary once it's up?
On Sun, Feb 22, 2009 at 7:30 PM, Tim Uckun <timuckun@gmail.com> wrote: >> 1. It's OK if we lose a few seconds (or even minutes) of transactions >> should one of our primary databases crash. >> 2. It's unlikely we'll need to load a backup that's more than a few days >> old. > > How do you handle failover and falling back to the primary once it's up? We don't plan to fail back to the primary. Amazon is a very different beast, once a server is dead, we just toss it away. The secondary permanently becomes the primary and we create a new tertiary from scratch which then becomes a log shipped copy of the secondary. Bryan
If you could publish a brief howto on this I would be most grateful. I bet many others would too.
On Mon, Feb 23, 2009 at 2:56 PM, Bryan Murphy <bmurphy1976@gmail.com> wrote:
We don't plan to fail back to the primary. Amazon is a very differentOn Sun, Feb 22, 2009 at 7:30 PM, Tim Uckun <timuckun@gmail.com> wrote:
>> 1. It's OK if we lose a few seconds (or even minutes) of transactions
>> should one of our primary databases crash.
>> 2. It's unlikely we'll need to load a backup that's more than a few days
>> old.
>
> How do you handle failover and falling back to the primary once it's up?
beast, once a server is dead, we just toss it away. The secondary
permanently becomes the primary and we create a new tertiary from
scratch which then becomes a log shipped copy of the secondary.
Bryan