Thread: Fwd: Postgres HA - pacemaker RA do not support auto failback
Hello,
Postgres version : 9.6
OS:Rhel 7.6
We are working on HA setup for postgres cluster of two nodes in active-passive mode.
Installed:
Pacemaker 1.1.19
Corosync 2.4.3
The pacemaker agent with this installation doesn't support automatic failback. What I mean by that is explained below:
1. Cluster is setup like A - B with A as master.
2. Kill services on A, node B will come up as master.
3. node A is ready to join the cluster, we have to delete the lock file it creates on any one of the node and execute the cleanup command to get the node back as standby
Step 3 is manual so HA is not achieved in real sense.
Please help to check:
1. Is there any version of the resouce agent which supports automatic failback? To avoid generation of lock file and deleting it.
2. If there is no such support, what checks should be added in pgsql RA to achieve
Please suggest.
Thanks.
On 8/12/19 2:57 PM, Shital A wrote: > > > Hello, > > Postgres version : 9.6 > OS:Rhel 7.6 > > We are working on HA setup for postgres cluster of two nodes in active-passive > mode. > > Installed: > Pacemaker 1.1.19 > Corosync 2.4.3 > > The pacemaker agent with this installation doesn't support automatic failback. > What I mean by that is explained below: > 1. Cluster is setup like A - B with A as master. > 2. Kill services on A, node B will come up as master. > 3. node A is ready to join the cluster, we have to delete the lock file it > creates on any one of the node and execute the cleanup command to get the node > back as standby > > Step 3 is manual so HA is not achieved in real sense. > > Please help to check: > 1. Is there any version of the resouce agent which supports automatic failback? > To avoid generation of lock file and deleting it. > > 2. If there is no such support, what checks should be added in pgsql RA to achieve > > Please suggest. > Thanks. > Hello, Which RA did you use? AFAIK there is two RA : - pgsql : https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql - PAF : http://clusterlabs.github.io/PAF/ (Which is different from pgsql RA : http://clusterlabs.github.io/PAF/FAQ.html#why-new-ra-for-postgresql) If I am right, PAF should handle the case when there no need to perform pgrewind (if the old master is not too advanced in the transaction log). Regards, -- Adrien NAYRAT https://blog.anayrat.info
Attachment
On Mon, 12 Aug 2019, 18:54 Adrien Nayrat, <adrien.nayrat@anayrat.info> wrote:
On 8/12/19 2:57 PM, Shital A wrote:
>
>
> Hello,
>
> Postgres version : 9.6
> OS:Rhel 7.6
>
> We are working on HA setup for postgres cluster of two nodes in active-passive
> mode.
>
> Installed:
> Pacemaker 1.1.19
> Corosync 2.4.3
>
> The pacemaker agent with this installation doesn't support automatic failback.
> What I mean by that is explained below:
> 1. Cluster is setup like A - B with A as master.
> 2. Kill services on A, node B will come up as master.
> 3. node A is ready to join the cluster, we have to delete the lock file it
> creates on any one of the node and execute the cleanup command to get the node
> back as standby
>
> Step 3 is manual so HA is not achieved in real sense.
>
> Please help to check:
> 1. Is there any version of the resouce agent which supports automatic failback?
> To avoid generation of lock file and deleting it.
>
> 2. If there is no such support, what checks should be added in pgsql RA to achieve
>
> Please suggest.
> Thanks.
>
Hello,
Which RA did you use? AFAIK there is two RA :
- pgsql : https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql
- PAF : http://clusterlabs.github.io/PAF/ (Which is different from pgsql RA :
http://clusterlabs.github.io/PAF/FAQ.html#why-new-ra-for-postgresql)
If I am right, PAF should handle the case when there no need to perform pgrewind
(if the old master is not too advanced in the transaction log).
Regards,
--
Adrien NAYRAT
https://blog.anayrat.info
Hello
We are using the pgsql RA installed with pacemaker.
Is PAF recommended over pgsql?
Has anyone changed pgsql to handle the lock file and other cases that might not have been handled?
Please advise.
Thanks.
On Tue, 13 Aug 2019, 11:50 Shital A, <brightuser2019@gmail.com> wrote:
On Mon, 12 Aug 2019, 18:54 Adrien Nayrat, <adrien.nayrat@anayrat.info> wrote:On 8/12/19 2:57 PM, Shital A wrote:
>
>
> Hello,
>
> Postgres version : 9.6
> OS:Rhel 7.6
>
> We are working on HA setup for postgres cluster of two nodes in active-passive
> mode.
>
> Installed:
> Pacemaker 1.1.19
> Corosync 2.4.3
>
> The pacemaker agent with this installation doesn't support automatic failback.
> What I mean by that is explained below:
> 1. Cluster is setup like A - B with A as master.
> 2. Kill services on A, node B will come up as master.
> 3. node A is ready to join the cluster, we have to delete the lock file it
> creates on any one of the node and execute the cleanup command to get the node
> back as standby
>
> Step 3 is manual so HA is not achieved in real sense.
>
> Please help to check:
> 1. Is there any version of the resouce agent which supports automatic failback?
> To avoid generation of lock file and deleting it.
>
> 2. If there is no such support, what checks should be added in pgsql RA to achieve
>
> Please suggest.
> Thanks.
>
Hello,
Which RA did you use? AFAIK there is two RA :
- pgsql : https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql
- PAF : http://clusterlabs.github.io/PAF/ (Which is different from pgsql RA :
http://clusterlabs.github.io/PAF/FAQ.html#why-new-ra-for-postgresql)
If I am right, PAF should handle the case when there no need to perform pgrewind
(if the old master is not too advanced in the transaction log).
Regards,
--
Adrien NAYRAT
https://blog.anayrat.infoHelloWe are using the pgsql RA installed with pacemaker.Is PAF recommended over pgsql?Has anyone changed pgsql to handle the lock file and other cases that might not have been handled?Please advise.Thanks.
Looks like the previous email bounced, received undelivered message. Sending again. We are using the pgsql RA installed with pacemaker and seems like it prevents failback by creating a lock file. We want to implement auto failback and need pointers on how to implement.
Thanks.
Re: Postgres HA - pacemaker RA do not support auto failback
From
"Jehan-Guillaume (ioguix) de Rorthais"
Date:
On Tue, 13 Aug 2019 16:38:44 +0530 Shital A <brightuser2019@gmail.com> wrote: > On Tue, 13 Aug 2019, 11:50 Shital A, <brightuser2019@gmail.com> wrote: > > > On Mon, 12 Aug 2019, 18:54 Adrien Nayrat, <adrien.nayrat@anayrat.info> > > wrote: > > > >> On 8/12/19 2:57 PM, Shital A wrote: > >> > Postgres version : 9.6 > >> > OS:Rhel 7.6 > >> > > >> > We are working on HA setup for postgres cluster of two nodes in > >> > active-passive mode. > >> > > >> > Installed: > >> > Pacemaker 1.1.19 > >> > Corosync 2.4.3 > >> > > >> > The pacemaker agent with this installation doesn't support automatic > >> > failback. > >> > What I mean by that is explained below: > >> > 1. Cluster is setup like A - B with A as master. > >> > 2. Kill services on A, node B will come up as master. > >> > 3. node A is ready to join the cluster, we have to delete the lock file > >> > it creates on any one of the node and execute the cleanup command to > >> > get the node back as standby > >> > > >> > Step 3 is manual so HA is not achieved in real sense. > >> > > >> > Please help to check: > >> > 1. Is there any version of the resouce agent which supports automatic > >> > failback? > >> > To avoid generation of lock file and deleting it. > >> > > >> > 2. If there is no such support, what checks should be added in pgsql RA > >> > to achieve > >> > >> Which RA did you use? AFAIK there is two RA : > >> - pgsql : > >> https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/pgsql > >> - PAF : http://clusterlabs.github.io/PAF/ (Which is different from pgsql > >> RA : > >> http://clusterlabs.github.io/PAF/FAQ.html#why-new-ra-for-postgresql) > >> > >> If I am right, PAF should handle the case when there no need to perform > >> pgrewind > >> (if the old master is not too advanced in the transaction log). PAF only make sure controlled failover (aka switchover) performs safely so the old master hook back as a standby correctly. Should a failure occurs with an automatic failover, PAF has no dark magic to automagicaly failoback the failing master as a standby. > > We are using the pgsql RA installed with pacemaker. > > > > Is PAF recommended over pgsql? Yes, as much as the historical RA. > > Has anyone changed pgsql to handle the lock file and other cases that > > might not have been handled? I don't know. However, I do think it would be either presumptuous to climb this road today. > Looks like the previous email bounced, received undelivered message. > Sending again. We are using the pgsql RA installed with pacemaker and seems > like it prevents failback by creating a lock file. We want to implement > auto failback and need pointers on how to implement. The lock file is here because PgSQL has no safety belt against data corruption when an old master hook on a new one while its LSN was further away from the timeline fork. You must take care of this by yourself. Moreover, should a failover occurs, you should probably check WHY it occurred and fix things before performing a blindly failback of a potential wounded instance. A wounded instance can hurt your cluster in many different and inconfortable ways. Last, as Adrien wrote, you can manually use pg_rewind to revert the old master to the timeline fork, as far as you enabled checksums or 'wal_log_hints=on' and kept enough WAL around. Another option is just to make sure you are able to build a new standby as was fast as possible. But remember: if your data is important, use quorum, fencing and watchdog. Good luck.