Re: Postgres PAF setup - Mailing list pgsql-general
From | Jehan-Guillaume (ioguix) de Rorthais |
---|---|
Subject | Re: Postgres PAF setup |
Date | |
Msg-id | 20180424170855.4f103a9a@firost Whole thread Raw |
In response to | Postgres PAF setup (Andrew Edenburn <andrew.edenburn@gm.com>) |
Responses |
RE: [EXTERNAL] Re: Postgres PAF setup
|
List | pgsql-general |
On Mon, 23 Apr 2018 18:09:43 +0000 Andrew Edenburn <andrew.edenburn@gm.com> wrote: > I am having issues with my PAF setup. I am new to Postgres and have setup > the cluster as seen below. I am getting this error when trying to start my > cluster resources. > [...] > > cleanup and clear is not fixing any issues and I am not seeing anything in > the logs. Any help would be greatly appreciated. This lack a lot of information. According to the PAF ressource agent, your instances are in an "unexpected state" on both nodes while PAF was actually trying to stop it. Pacemaker might decide to stop a ressource if the start operation fails. Stopping it when the start failed give some chances to the resource agent to stop the resource gracefully if still possible. I suspect you have some setup mistake on both nodes, maybe the exact same one... You should probably provide your full logs from pacemaker/corosync with timing information so we can check all the messages coming from PAF from the very beginning of the startup attempt. > have-watchdog=false \ you should probably consider to setup watchdog in your cluster. > stonith-enabled=false \ This is really bad. Your cluster will NOT work as expected. PAF **requires** Stonith to be enabled and to properly working. Without it, soon or later, you will experience some unexpected reaction from the cluster (freezing all actions, etc). > no-quorum-policy=ignore \ You should not ignore quorum, even in a two node cluster. See "two_node" parameter in the manual of corosync.conf. > migration-threshold=1 \ > rsc_defaults rsc_defaults-options: \ > migration-threshold=5 \ The later is the supported way to set migration-threshold. Your "migration-threshold=1" should not be a cluster property but a default ressource option. > My pcs Config > Corosync Nodes: > dcmilphlum223 dcmilphlum224 > Pacemaker Nodes: > dcmilphlum223 dcmilphlum224 > > Resources: > Master: pgsql-ha > Meta Attrs: notify=true target-role=Stopped This target-role might have been set by the cluster because it can not fence nodes (which might be easier to deal with in your situation btw). That means the cluster will keep this resource down because of previous errors. > recovery_template=/pgsql/data/pg7000/recovery.conf.pcmk You should probably not put your recovery.conf.pcmk in your PGDATA. Both files are different between each nodes. As you might want to rebuild the standby or old master after some failures, you would have to correct it each time. Keep it outside of the PGDATA to avoid this useless step. > dcmilphlum224: pgsqld-data-status=LATEST I suppose this comes from the "pgsql" resource agent, definitely not from PAF... Regards,
pgsql-general by date: