Thread: Failover, Wal Logging, and Multiple Spares

Failover, Wal Logging, and Multiple Spares

From
Bryan Murphy
Date:
Assuming we are running a Postgres instance that is shipping log files to 2 or more warm spares, is there a way I can fail over to one of the spares, and have the second spare start receiving updates from the new master without missing a beat?  I can live with losing the old master, and at least at the moment it would be a controlled failover, but I would like to to know if it's possible during an uncontrolled failover as well (catastrophic hardware failure).

Right now, we have just that setup, but every time I've failed over to the new master, we've had to rebuild our spares from scratch and unfortunately this is a multi-hour long process.  We can't afford the risk of not having a warm spare for that length of time.  We're planning to move entirely to a slony cluster, but I'd like to fail over to a more powerful machine before we begin the slony migration as the current server is already overloaded.

Thanks,
Bryan

Re: Failover, Wal Logging, and Multiple Spares

From
Bryan Murphy
Date:
Ok, I've asked this a few times, but nobody ever responded.  I think I finally got it though, could somebody confirm my logic?  Basically, you setup a chain of servers, and when fails you replicate to the next link in the chain, like so:

Master (A) --> Warm Standby (B) --> Warn Standby (C)  --> etc.

Master Fails, now becomes:

Old Master (A)  xxxxx> New Master (B) --> Warm Standby (C)

And, of course, you might have an additional replication chain from Master (A) just in case you goof something up in the failover process, but that's the basic idea.

Thanks,
Bryan


On Sun, Aug 16, 2009 at 9:35 PM, Bryan Murphy <bmurphy1976@gmail.com> wrote:
Assuming we are running a Postgres instance that is shipping log files to 2 or more warm spares, is there a way I can fail over to one of the spares, and have the second spare start receiving updates from the new master without missing a beat?  I can live with losing the old master, and at least at the moment it would be a controlled failover, but I would like to to know if it's possible during an uncontrolled failover as well (catastrophic hardware failure).

Right now, we have just that setup, but every time I've failed over to the new master, we've had to rebuild our spares from scratch and unfortunately this is a multi-hour long process.  We can't afford the risk of not having a warm spare for that length of time.  We're planning to move entirely to a slony cluster, but I'd like to fail over to a more powerful machine before we begin the slony migration as the current server is already overloaded.

Thanks,
Bryan

Re: Failover, Wal Logging, and Multiple Spares

From
Yaroslav Tykhiy
Date:
On 18/08/2009, at 9:36 AM, Bryan Murphy wrote:

> Ok, I've asked this a few times, but nobody ever responded.  I think
> I finally got it though, could somebody confirm my logic?
> Basically, you setup a chain of servers, and when fails you
> replicate to the next link in the chain, like so:
>
> Master (A) --> Warm Standby (B) --> Warn Standby (C)  --> etc.
>
> Master Fails, now becomes:
>
> Old Master (A)  xxxxx> New Master (B) --> Warm Standby (C)
>
> And, of course, you might have an additional replication chain from
> Master (A) just in case you goof something up in the failover
> process, but that's the basic idea.

Excuse me, but I fail to see how you are going to replicate from one
warm standby to another warm standby.  I don't think PostgreSQL can do
that.  That said, the idea of just partially degrading a warm standby
cluster by electing a new master node looked very attractive to me, too.

> On Sun, Aug 16, 2009 at 9:35 PM, Bryan Murphy
> <bmurphy1976@gmail.com> wrote:
> Assuming we are running a Postgres instance that is shipping log
> files to 2 or more warm spares, is there a way I can fail over to
> one of the spares, and have the second spare start receiving updates
> from the new master without missing a beat?  I can live with losing
> the old master, and at least at the moment it would be a controlled
> failover, but I would like to to know if it's possible during an
> uncontrolled failover as well (catastrophic hardware failure).
>
> Right now, we have just that setup, but every time I've failed over
> to the new master, we've had to rebuild our spares from scratch and
> unfortunately this is a multi-hour long process.  We can't afford
> the risk of not having a warm spare for that length of time.  We're
> planning to move entirely to a slony cluster, but I'd like to fail
> over to a more powerful machine before we begin the slony migration
> as the current server is already overloaded.

Encouraged by Bruce Momjian, I tried and had some success in this
area.  It was a controlled failover but it worked like a charm.  An
obvious condition was that the warm standbys be in perfect sync; you
can't do the trick if some of them received the last WAL segment while
the others didn't.

Please see http://archives.postgresql.org/pgsql-general/2009-07/msg00215.php
  for my report.  Of course, questions and comments are welcome.

Cheers,
Yar

Re: Failover, Wal Logging, and Multiple Spares

From
Greg Stark
Date:
On Tue, Aug 18, 2009 at 1:25 AM, Yaroslav Tykhiy<yar@barnet.com.au> wrote:
> Encouraged by Bruce Momjian, I tried and had some success in this area.  It
> was a controlled failover but it worked like a charm.  An obvious condition
> was that the warm standbys be in perfect sync; you can't do the trick if
> some of them received the last WAL segment while the others didn't.

It seems like it should be possible to weaken this constraint. As long
as you're careful to fail over to the slave which is the furthest
ahead in replaying WAL. All the other slaves must switch to replaying
logs from the new master before the point where it took over.

This does seem like a very useful area to explore.


--
greg
http://mit.edu/~gsstark/resume.pdf