Thread: Auotmated postgres failover

Auotmated postgres failover

From
jwiencek3@comcast.net
Date:
Hello


I'm looking for a tool to automate PostgreSQL cluster management failover in the event the master database were to become unavailable.   Currently are manually issuing a "pg_ctl promote"  once we become aware that the master database has crashed.


Is repmgr a via solution?  Please pass along experiences with repmgr.

Are there any other  tools available to automatically issue the "promote" in the event of a master database crash?



Regards

John Wiencek

Re: Auotmated postgres failover

From
John R Pierce
Date:
On 1/21/2016 11:07 AM, jwiencek3@comcast.net wrote:


I'm looking for a tool to automate PostgreSQL cluster management failover in the event the master database were to become unavailable.   Currently are manually issuing a "pg_ctl promote"  once we become aware that the master database has crashed.


Is repmgr a via solution?  Please pass along experiences with repmgr.

Are there any other  tools available to automatically issue the "promote" in the event of a master database crash?


repmgr is a tool you could use in conjunction with a generic cluster management system like linuxha/heartbeat, vcs, etc.  

the most difficult part is reliably determining that A) the master has crashed, and B) fencing the failed old master so it doesn't wake up and think its still in charge.



-- 
john r pierce, recycling bits in santa cruz

Re: Auotmated postgres failover

From
Andrew Sullivan
Date:
On Thu, Jan 21, 2016 at 11:34:18AM -0800, John R Pierce wrote:
> the most difficult part is reliably determining that A) the master has
> crashed, and B) fencing the failed old master so it doesn't wake up and
> think its still in charge.
>

And, depending on your workload, C) that you actually want to fail over.

I've seen an awful lot of people want automatic failover who also
can't afford for the already-committed transactions on the master to
be lost.  Unless you're running synchronous, be sure you have the
workload that can actually accept lost writes.

A

--
Andrew Sullivan
ajs@crankycanuck.ca


Re: Auotmated postgres failover

From
Jehan-Guillaume de Rorthais
Date:
Le Thu, 21 Jan 2016 11:34:18 -0800,
John R Pierce <pierce@hogranch.com> a écrit :

> On 1/21/2016 11:07 AM, jwiencek3@comcast.net wrote:
> >
> >
> > I'm looking for a tool to automate PostgreSQL cluster management
> > failover in the event the master database were to become unavailable.
> >   Currently are manually issuing a "pg_ctl promote"  once we become
> > aware that the master database has crashed.
> >
> >
> > Is repmgr a via solution?  Please pass along experiences with repmgr.
> >
> > Are there any other  tools available to automatically issue the
> > "promote" in the event of a master database crash?

Yes, 3 different Pacemaker resource agents exist for PostgreSQL:

 * official one, in the package "resource-agents" on most linux distribs.
   This one is pretty complex and support multistate and stateless setup.
 * a simple, stupid, easy and stateless, agent:
   https://github.com/dalibo/pgsql-resource-agent/tree/master/stateless
   This one is fine for a 2 node cluster
 * a multistate-aware agent:
   https://github.com/dalibo/pgsql-resource-agent/tree/master/multistate
   This one is nice for multi-node cluster, searching for the best known slave
   to elect after a master lost.

Some important docs are available in the pgsql-resource-agent (PRA) repo:
 * https://github.com/dalibo/pgsql-resource-agent/blob/master/FENCING.md
 * the stateless:
   https://github.com/dalibo/pgsql-resource-agent/blob/master/stateless/README.md
 * the multistate:
   https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/README.md
   https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/INSTALL.md
   https://github.com/dalibo/pgsql-resource-agent/blob/master/multistate/docs/Quick_Start.md

> repmgr is a tool you could use in conjunction with a generic cluster
> management system like linuxha/heartbeat, vcs, etc.
>
> the most difficult part is reliably determining that A) the master has
> crashed, and B) fencing the failed old master so it doesn't wake up and
> think its still in charge.

+1
--
Jehan-Guillaume de Rorthais
Dalibo