Thread: PAF with Pacemaker

PAF with Pacemaker

From
Vijaykumar Patil
Date:

Hi Team ,

 

I have two postgres server one is primary and other one replica, I have setup replication and configured pacemaker and corosync.

 

But still I’m facing issue while creating resource. It is showing invalid parameters.

 

[root@scrbtrheldbaas001 heartbeat]# pcs status

Cluster name: pg_cluster

Cluster Summary:

  * Stack: corosync (Pacemaker is running)

  * Current DC: scrbtrheldbaas001 (version 2.1.6-8.el8-6fdc9deea29) - partition with quorum

  * Last updated: Thu Nov 30 19:04:29 2023 on scrbtrheldbaas001

  * Last change:  Thu Nov 30 13:41:53 2023 by root via cibadmin on scrbtrheldbaas002

  * 2 nodes configured

  * 2 resource instances configured

 

Node List:

  * Online: [ scrbtrheldbaas001 scrbtrheldbaas002 ]

 

Full List of Resources:

  * Clone Set: pgsqld-clone [pgsqld] (promotable):

    * Stopped (invalid parameter): [ scrbtrheldbaas001 scrbtrheldbaas002 ]

 

Daemon Status:

  corosync: active/enabled

  pacemaker: active/enabled

  pcsd: active/enabled

[root@scrbtrheldbaas001 heartbeat]#

 

 

My postgres version is 15.3 but still  is searching recover.conf . please find below log.

 

 

 

 

 

Node 1 pacemaker log:-

 

[root@scrbtrheldbaas001 heartbeat]# journalctl -xe | grep pacemaker

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-controld[69280]:  notice: State transition S_IDLE -> S_POLICY_ENGINE

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]:  notice: Treating probe result 'invalid parameter' for pgsqld:0 on scrbtrheldbaas002 as 'not running'

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]:  notice: Treating probe result 'invalid parameter' for pgsqld:0 on scrbtrheldbaas002 as 'not running'

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]:  notice: Treating probe result 'invalid parameter' for pgsqld:0 on scrbtrheldbaas001 as 'not running'

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]:  notice: Treating probe result 'invalid parameter' for pgsqld:0 on scrbtrheldbaas001 as 'not running'

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]:  notice: Calculated transition 3, saving inputs in /var/lib/pacemaker/pengine/pe-input-87.bz2

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-controld[69280]:  notice: Transition 3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete

Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-controld[69280]:  notice: State transition S_TRANSITION_ENGINE -> S_IDLE

[root@scrbtrheldbaas001 heartbeat]#

 

Node 2 pacemkaer log:-

 

13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1112600]:  notice: Stopping pacemaker-fenced

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1112602]:  notice: Caught 'Terminated' signal

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1112600]:  notice: Stopping pacemaker-based

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1112601]:  notice: Caught 'Terminated' signal

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1112601]:  notice: Disconnected from Corosync

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1112601]:  notice: Disconnected from Corosync

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1112600]:  notice: Shutdown complete

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net systemd[1]: pacemaker.service: Succeeded.

-- The unit pacemaker.service has successfully entered the 'dead' state.

-- Subject: Unit pacemaker.service has finished shutting down

-- Unit pacemaker.service has finished shutting down.

-- Subject: Unit pacemaker.service has finished start-up

-- Unit pacemaker.service has finished starting up.

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1114127]:  notice: Additional logging available in /var/log/pacemaker/pacemaker.log

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1114127]:  notice: Starting Pacemaker 2.1.6-8.el8

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1114127]:  notice: Pacemaker daemon successfully started and accepting connections

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1114128]:  notice: Starting Pacemaker CIB manager

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-execd[1114130]:  notice: Starting Pacemaker local executor

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-execd[1114130]:  notice: Pacemaker local executor successfully started and accepting connections

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-execd[1114130]:  notice: OCF resource agent search path is /usr/lib/ocf/resource.d

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  notice: Additional logging available in /var/log/pacemaker/pacemaker.log

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  notice: Starting Pacemaker fencer

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  notice: Connecting to corosync cluster infrastructure

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-schedulerd[1114132]:  notice: Starting Pacemaker scheduler

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-schedulerd[1114132]:  notice: Pacemaker scheduler successfully started and accepting connections

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Starting Pacemaker controller

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Additional logging available in /var/log/pacemaker/pacemaker.log

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Starting Pacemaker node attribute manager

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  notice: Node scrbtrheldbaas002 state is now member

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1114128]:  notice: Connecting to corosync cluster infrastructure

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1114128]:  notice: Node scrbtrheldbaas002 state is now member

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1114128]:  notice: Pacemaker CIB manager successfully started and accepting connections

Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1114128]:  notice: Node scrbtrheldbaas001 state is now member

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Connecting to corosync cluster infrastructure

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Connecting to corosync cluster infrastructure

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  notice: Pacemaker fencer successfully started and accepting connections

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  notice: Node scrbtrheldbaas001 state is now member

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  warning: Blind faith: not fencing unseen nodes

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Node scrbtrheldbaas002 state is now member

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Quorum acquired

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Pacemaker node attribute manager successfully started and accepting connections

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Node scrbtrheldbaas001 state is now member

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Recorded new attribute writer: scrbtrheldbaas001 (was unset)

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Setting #attrd-protocol[scrbtrheldbaas001]: (unset) -> 5

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Setting #feature-set[scrbtrheldbaas001]: (unset) -> 3.17.4

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Setting #attrd-protocol[scrbtrheldbaas002]: (unset) -> 5

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Node scrbtrheldbaas001 state is now member

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Node scrbtrheldbaas002 state is now member

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Pacemaker controller successfully started and accepting connections

Nov 30 13:43:29 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: State transition S_STARTING -> S_PENDING

Nov 30 13:43:30 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Fencer successfully connected

Nov 30 13:43:30 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: State transition S_PENDING -> S_NOT_DC

Nov 30 13:43:30 scrbtrheldbaas002.crb.apmoller.net pacemaker-attrd[1114131]:  notice: Setting #feature-set[scrbtrheldbaas002]: (unset) -> 3.17.4

Nov 30 13:43:32 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Requesting local execution of probe operation for pgsqld on scrbtrheldbaas002

Nov 30 13:43:32 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: Result of probe operation for pgsqld on scrbtrheldbaas002: invalid parameter (Recovery template file must contain "standby_mode = on")

Nov 30 13:43:32 scrbtrheldbaas002.crb.apmoller.net pacemaker-controld[1114133]:  notice: pgsqld_monitor_0@scrbtrheldbaas002 output [ ocf-exit-reason:Recovery template file must contain "standby_mode = on"\n ]

[root@scrbtrheldbaas002 ~]#

 

Thanks & Regards

Vijaykumar

Database Operations

Maersk Global Service Centre, Pune.

 




The information contained in this message is privileged and intended only for the recipients named. If the reader is not a representative of the intended recipient, any review, dissemination or copying of this message or the information it contains is prohibited. If you have received this message in error, please immediately notify the sender, and delete the original message and attachments.

Maersk will as part of our communication and interaction with you collect and process your personal data. You can read more about Maersk’s collection and processing of your personal data and your rights as a data subject in our privacy policy

Please consider the environment before printing this email.

Classification: Internal

Attachment

Re: PAF with Pacemaker

From
Jehan-Guillaume de Rorthais
Date:
Hi,

On Thu, 30 Nov 2023 19:07:34 +0000
Vijaykumar Patil <vijaykumar.patil@maersk.com> wrote:

> I have two postgres server one is primary and other one replica, I have setup
> replication and configured pacemaker and corosync.
> 
> But still I'm facing issue while creating resource. It is showing invalid
> parameters.
> 
> [root@scrbtrheldbaas001 heartbeat]# pcs status
> Cluster name: pg_cluster
> Cluster Summary:
>   * Stack: corosync (Pacemaker is running)
>   * Current DC: scrbtrheldbaas001 (version 2.1.6-8.el8-6fdc9deea29) -
> partition with quorum
>   * Last updated: Thu Nov 30 19:04:29 2023 on scrbtrheldbaas001
>   * Last change:  Thu Nov 30 13:41:53 2023 by root via cibadmin on
> scrbtrheldbaas002
>   * 2 nodes configured
>   * 2 resource instances configured
> 
> Node List:
>   * Online: [ scrbtrheldbaas001 scrbtrheldbaas002 ]
> 
> Full List of Resources:
>   * Clone Set: pgsqld-clone [pgsqld] (promotable):
>     * Stopped (invalid parameter): [ scrbtrheldbaas001 scrbtrheldbaas002 ]

Side note: make sure to setup fencing and/or watchdog.

> My postgres version is 15.3 but still  is searching recover.conf . please
> find below log.

It does not search for the recovery.conf for v15. In fact, if you setup a
recovery.conf with postgres v15, PAF errors immediately with the appropriate
error message:

https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L1350

> ...
> Nov 30 13:43:32 scrbtrheldbaas002.crb.apmoller.net
> pacemaker-controld[1114133]:  notice: Result of probe operation for pgsqld on
> scrbtrheldbaas002: invalid parameter (Recovery template file must contain
> "standby_mode = on")

This is the real error. But this error is only checked for v11 and before. So
now I wonder what version of PAF you are actually using? Is it up to date or a
very old one?

Or maybe the agent failed to parse correctly your actual version from
$PGDATA/PG_VERSION ?

https://github.com/ClusterLabs/PAF/blob/master/script/pgsqlms#L637

++