Thread: Backup Strategy Second Opinion

Backup Strategy Second Opinion

From
Bryan Murphy
Date:
Hey guys, we just moved our system to Amazon's EC2 service.  I'm a bit
paranoid about backups, and this environment is very different than
our previous environment.  I was hoping you guys could point out any
major flaws in our backup strategy that I may have missed.

A few assumptions:

1. It's OK if we lose a few seconds (or even minutes) of transactions
should one of our primary databases crash.
2. It's unlikely we'll need to load a backup that's more than a few days old.

Here's what we're currently doing:

Primary database ships WAL files to S3.
Snapshot primary database to tar file.
Upload tar file to S3.

Create secondary database from tar file on S3.
Put secondary database into continuous recovery mode, pulling wal files from S3.

Every night on secondary database:
  * shutdown postgres
  * unmount ebs volume that contains postgres data
  * create new snapshot of ebs volume
  * remount ebs volume
  * restart postgres

I manually delete older log files and snapshots once I've verified
that a newer snapshot can be brought up as an active database and have
run a few tests on it.

Other than that, we have some miscellaneous monitoring to keep track
of the # of logs files in the pg_xlog directory and the amount of
available disk space on all the servers.  Ideally, if the # of log
files starts to grow beyond a certain threshold, that indicates
something went wrong with the log shipping and we'll investigate to
see what the problem is.

I think this is a pretty good strategy, but I've been so caught up in
this I may not be seeing the forest through the trees so I thought I'd
ask for a sanity check here.

Thanks,
Bryan

Re: Backup Strategy Second Opinion

From
Tim Uckun
Date:
1. It's OK if we lose a few seconds (or even minutes) of transactions
should one of our primary databases crash.
2. It's unlikely we'll need to load a backup that's more than a few days old.

How do you handle failover and falling back to the primary once it's up?

Re: Backup Strategy Second Opinion

From
Bryan Murphy
Date:
On Sun, Feb 22, 2009 at 7:30 PM, Tim Uckun <timuckun@gmail.com> wrote:
>> 1. It's OK if we lose a few seconds (or even minutes) of transactions
>> should one of our primary databases crash.
>> 2. It's unlikely we'll need to load a backup that's more than a few days
>> old.
>
> How do you handle failover and falling back to the primary once it's up?

We don't plan to fail back to the primary.  Amazon is a very different
beast, once a server is dead, we just toss it away.  The secondary
permanently becomes the primary and we create a new tertiary from
scratch which then becomes a log shipped copy of the secondary.

Bryan

Re: Backup Strategy Second Opinion

From
Tim Uckun
Date:
If you could publish a brief howto on this I would be most grateful. I bet many others would too.


On Mon, Feb 23, 2009 at 2:56 PM, Bryan Murphy <bmurphy1976@gmail.com> wrote:
On Sun, Feb 22, 2009 at 7:30 PM, Tim Uckun <timuckun@gmail.com> wrote:
>> 1. It's OK if we lose a few seconds (or even minutes) of transactions
>> should one of our primary databases crash.
>> 2. It's unlikely we'll need to load a backup that's more than a few days
>> old.
>
> How do you handle failover and falling back to the primary once it's up?

We don't plan to fail back to the primary.  Amazon is a very different
beast, once a server is dead, we just toss it away.  The secondary
permanently becomes the primary and we create a new tertiary from
scratch which then becomes a log shipped copy of the secondary.

Bryan