Query 2 Node HA test case result - Mailing list pgsql-general
From | Mukesh Tanuku |
---|---|
Subject | Query 2 Node HA test case result |
Date | |
Msg-id | CAJzgB-F7jEUeiHDWZKXHqMiYvKiwYcQkRoA47=pYQ9rkhRV6+A@mail.gmail.com Whole thread Raw |
List | pgsql-general |
sr_check_period 10sec
health_check_period 30sec
health_check_timeout 20 sec
health_check_max_retries 3
health_check_retry_delay 1
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
From VM2:
Pgpool.log
14:30:17 N/w disconnected
After 10 sec the streaming replication check failed and got timed out.
2024-07-03 14:30:26.176: sr_check_worker pid 58187: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
Then pgpool failed to do health check since it got timed out as per health_check_timeout set to 20 sec
2024-07-03 14:30:35.869: health_check0 pid 58188: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
Re-trying health_check & sr_check but again timed out.
2024-07-03 14:30:46.187: sr_check_worker pid 58187: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
2024-07-03 14:30:46.880: health_check0 pid 58188: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
Watchdog received a message saying the Leader node is lost.
2024-07-03 14:30:47.192: watchdog pid 58151: WARNING: we have not received a beacon message from leader node "staging-ha0001:9999 Linux staging-ha0001"
2024-07-03 14:30:47.192: watchdog pid 58151: DETAIL: requesting info message from leader node
2024-07-03 14:30:54.312: watchdog pid 58151: LOG: read from socket failed, remote end closed the connection
2024-07-03 14:30:54.312: watchdog pid 58151: LOG: client socket of staging-ha0001:9999 Linux staging-ha0001 is closed
2024-07-03 14:30:54.313: watchdog pid 58151: LOG: remote node "staging-ha0001:9999 Linux staging-ha0001" is reporting that it has lost us
2024-07-03 14:30:54.313: watchdog pid 58151: LOG: we are lost on the leader node "staging-ha0001:9999 Linux staging-ha0001"
Re-trying health_check & sr_check but again timed out.
2024-07-03 14:30:57.888: health_check0 pid 58188: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
2024-07-03 14:30:57.888: health_check0 pid 58188: LOG: health check retrying on DB node: 0 (round:3)
2024-07-03 14:31:06.201: sr_check_worker pid 58187: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
After 10 sec from the time we lost the leader node, watchdog changed current node to LEADER node
2024-07-03 14:31:04.199: watchdog pid 58151: LOG: watchdog node state changed from [STANDING FOR LEADER] to [LEADER]
health_check is failed on node 0 and it received a degenerated request for node 0 and the pgpool main process started quarantining staging-ha0001(5432) (shutting down)
2024-07-03 14:31:08.202: watchdog pid 58151: LOG: setting the local node "staging-ha0002:9999 Linux staging-ha0002" as watchdog cluster leader
2024-07-03 14:31:08.202: watchdog pid 58151: LOG: signal_user1_to_parent_with_reason(1)
2024-07-03 14:31:08.202: watchdog pid 58151: LOG: I am the cluster leader node but we do not have enough nodes in cluster
2024-07-03 14:31:08.202: watchdog pid 58151: DETAIL: waiting for the quorum to start escalation process
2024-07-03 14:31:08.202: main pid 58147: LOG: Pgpool-II parent process received SIGUSR1
2024-07-03 14:31:08.202: main pid 58147: LOG: Pgpool-II parent process received watchdog state change signal from watchdog
2024-07-03 14:31:08.899: health_check0 pid 58188: LOG: failed to connect to PostgreSQL server on "staging-ha0001:5432", timed out
2024-07-03 14:31:08.899: health_check0 pid 58188: LOG: health check failed on node 0 (timeout:0)
2024-07-03 14:31:08.899: health_check0 pid 58188: LOG: received degenerate backend request for node_id: 0 from pid [58188]
2024-07-03 14:31:08.899: watchdog pid 58151: LOG: watchdog received the failover command from local pgpool-II on IPC interface
2024-07-03 14:31:08.899: watchdog pid 58151: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2024-07-03 14:31:08.899: watchdog pid 58151: LOG: failover requires the quorum to hold, which is not present at the moment
2024-07-03 14:31:08.899: watchdog pid 58151: DETAIL: Rejecting the failover request
2024-07-03 14:31:08.899: watchdog pid 58151: LOG: failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II node "staging-ha0002:9999 Linux staging-ha0002" is rejected because the watchdog cluster does not hold the quorum
2024-07-03 14:31:08.900: health_check0 pid 58188: LOG: degenerate backend request for 1 node(s) from pid [58188], is changed to quarantine node request by watchdog
2024-07-03 14:31:08.900: health_check0 pid 58188: DETAIL: watchdog does not holds the quorum
2024-07-03 14:31:08.900: health_check0 pid 58188: LOG: signal_user1_to_parent_with_reason(0)
2024-07-03 14:31:08.900: main pid 58147: LOG: Pgpool-II parent process received SIGUSR1
2024-07-03 14:31:08.900: main pid 58147: LOG: Pgpool-II parent process has received failover request
2024-07-03 14:31:08.900: watchdog pid 58151: LOG: received the failover indication from Pgpool-II on IPC interface
2024-07-03 14:31:08.900: watchdog pid 58151: LOG: watchdog is informed of failover start by the main process
2024-07-03 14:31:08.900: main pid 58147: LOG: === Starting quarantine. shutdown host staging-ha0001(5432) ===
2024-07-03 14:31:08.900: main pid 58147: LOG: Restart all children
2024-07-03 14:31:08.900: main pid 58147: LOG: failover: set new primary node: -1
2024-07-03 14:31:08.900: main pid 58147: LOG: failover: set new main node: 1
2024-07-03 14:31:08.906: sr_check_worker pid 58187: ERROR: Failed to check replication time lag
2024-07-03 14:31:08.906: sr_check_worker pid 58187: DETAIL: No persistent db connection for the node 0
2024-07-03 14:31:08.906: sr_check_worker pid 58187: HINT: check sr_check_user and sr_check_password
2024-07-03 14:31:08.906: sr_check_worker pid 58187: CONTEXT: while checking replication time lag
2024-07-03 14:31:08.906: sr_check_worker pid 58187: LOG: worker process received restart request
2024-07-03 14:31:08.906: watchdog pid 58151: LOG: received the failover indication from Pgpool-II on IPC interface
2024-07-03 14:31:08.906: watchdog pid 58151: LOG: watchdog is informed of failover end by the main process
2024-07-03 14:31:08.906: main pid 58147: LOG: === Quarantine done. shutdown host staging-ha0001(5432) ===
2024-07-03 14:31:09.906: pcp_main pid 58186: LOG: restart request received in pcp child process
2024-07-03 14:31:09.907: main pid 58147: LOG: PCP child 58186 exits with status 0 in failover()
2024-07-03 14:31:09.908: main pid 58147: LOG: fork a new PCP child pid 58578 in failover()
2024-07-03 14:31:09.908: main pid 58147: LOG: reaper handler
2024-07-03 14:31:09.908: pcp_main pid 58578: LOG: PCP process: 58578 started
2024-07-03 14:31:09.909: main pid 58147: LOG: reaper handler: exiting normally
2024-07-03 14:31:09.909: sr_check_worker pid 58579: LOG: process started
2024-07-03 14:31:19.915: watchdog pid 58151: LOG: not able to send messages to remote node "staging-ha0001:9999 Linux staging-ha0001"
2024-07-03 14:31:19.915: watchdog pid 58151: DETAIL: marking the node as lost
2024-07-03 14:31:19.915: watchdog pid 58151: LOG: remote node "staging-ha0001:9999 Linux staging-ha0001" is lost
From VM1:
pgpool.log
2024-07-03 14:30:36.444: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is not replying to our beacons
2024-07-03 14:30:36.444: watchdog pid 8620: DETAIL: missed beacon reply count:2
2024-07-03 14:30:37.448: sr_check_worker pid 65605: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:30:46.067: health_check1 pid 8676: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:30:46.068: health_check1 pid 8676: LOG: health check retrying on DB node: 1 (round:1)
2024-07-03 14:30:46.455: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is not replying to our beacons
2024-07-03 14:30:46.455: watchdog pid 8620: DETAIL: missed beacon reply count:3
2024-07-03 14:30:47.449: sr_check_worker pid 65605: ERROR: Failed to check replication time lag
2024-07-03 14:30:47.449: sr_check_worker pid 65605: DETAIL: No persistent db connection for the node 1
2024-07-03 14:30:47.449: sr_check_worker pid 65605: HINT: check sr_check_user and sr_check_password
2024-07-03 14:30:47.449: sr_check_worker pid 65605: CONTEXT: while checking replication time lag
2024-07-03 14:30:55.104: child pid 65509: LOG: failover or failback event detected
2024-07-03 14:30:55.104: child pid 65509: DETAIL: restarting myself
2024-07-03 14:30:55.104: main pid 8617: LOG: reaper handler
2024-07-03 14:30:55.105: main pid 8617: LOG: reaper handler: exiting normally
2024-07-03 14:30:56.459: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is not replying to our beacons
2024-07-03 14:30:56.459: watchdog pid 8620: DETAIL: missed beacon reply count:4
2024-07-03 14:30:56.459: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is not responding to our beacon messages
2024-07-03 14:30:56.459: watchdog pid 8620: DETAIL: marking the node as lost
2024-07-03 14:30:56.459: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is lost
2024-07-03 14:30:56.460: watchdog pid 8620: LOG: removing watchdog node "staging-ha0002:9999 Linux staging-ha0002" from the standby list
2024-07-03 14:30:56.460: watchdog pid 8620: LOG: We have lost the quorum
2024-07-03 14:30:56.460: watchdog pid 8620: LOG: signal_user1_to_parent_with_reason(3)
2024-07-03 14:30:56.460: main pid 8617: LOG: Pgpool-II parent process received SIGUSR1
2024-07-03 14:30:56.460: main pid 8617: LOG: Pgpool-II parent process received watchdog quorum change signal from watchdog
2024-07-03 14:30:56.461: watchdog_utility pid 66197: LOG: watchdog: de-escalation started
sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
2024-07-03 14:30:57.078: health_check1 pid 8676: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:30:57.078: health_check1 pid 8676: LOG: health check retrying on DB node: 1 (round:2)
2024-07-03 14:30:57.418: life_check pid 8639: LOG: informing the node status change to watchdog
2024-07-03 14:30:57.418: life_check pid 8639: DETAIL: node id :1 status = "NODE DEAD" message:"No heartbeat signal from node"
2024-07-03 14:30:57.418: watchdog pid 8620: LOG: received node status change ipc message
2024-07-03 14:30:57.418: watchdog pid 8620: DETAIL: No heartbeat signal from node
2024-07-03 14:30:57.418: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is lost
2024-07-03 14:30:57.464: sr_check_worker pid 65605: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
sudo: a password is required
2024-07-03 14:30:59.301: watchdog_utility pid 66197: LOG: failed to release the delegate IP:"10.127.1.20"
2024-07-03 14:30:59.301: watchdog_utility pid 66197: DETAIL: 'if_down_cmd' failed
2024-07-03 14:30:59.301: watchdog_utility pid 66197: WARNING: watchdog de-escalation failed to bring down delegate IP
2024-07-03 14:30:59.301: watchdog pid 8620: LOG: watchdog de-escalation process with pid: 66197 exit with SUCCESS.
2024-07-03 14:31:07.465: sr_check_worker pid 65605: ERROR: Failed to check replication time lag
2024-07-03 14:31:07.465: sr_check_worker pid 65605: DETAIL: No persistent db connection for the node 1
2024-07-03 14:31:07.465: sr_check_worker pid 65605: HINT: check sr_check_user and sr_check_password
2024-07-03 14:31:07.465: sr_check_worker pid 65605: CONTEXT: while checking replication time lag
2024-07-03 14:31:08.089: health_check1 pid 8676: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:31:08.089: health_check1 pid 8676: LOG: health check retrying on DB node: 1 (round:3)
2024-07-03 14:31:17.480: sr_check_worker pid 65605: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:31:19.097: health_check1 pid 8676: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:31:19.097: health_check1 pid 8676: LOG: health check failed on node 1 (timeout:0)
2024-07-03 14:31:19.097: health_check1 pid 8676: LOG: received degenerate backend request for node_id: 1 from pid [8676]
2024-07-03 14:31:19.097: watchdog pid 8620: LOG: watchdog received the failover command from local pgpool-II on IPC interface
2024-07-03 14:31:19.097: watchdog pid 8620: LOG: watchdog is processing the failover command [DEGENERATE_BACKEND_REQUEST] received from local pgpool-II on IPC interface
2024-07-03 14:31:19.097: watchdog pid 8620: LOG: failover requires the quorum to hold, which is not present at the moment
2024-07-03 14:31:19.097: watchdog pid 8620: DETAIL: Rejecting the failover request
2024-07-03 14:31:19.097: watchdog pid 8620: LOG: failover command [DEGENERATE_BACKEND_REQUEST] request from pgpool-II node "staging-ha0001:9999 Linux staging-ha0001" is rejected because the watchdog cluster does not hold the quorum
2024-07-03 14:31:19.097: health_check1 pid 8676: LOG: degenerate backend request for 1 node(s) from pid [8676], is changed to quarantine node request by watchdog
2024-07-03 14:31:19.097: health_check1 pid 8676: DETAIL: watchdog does not holds the quorum
2024-07-03 14:31:19.097: health_check1 pid 8676: LOG: signal_user1_to_parent_with_reason(0)
2024-07-03 14:31:19.097: main pid 8617: LOG: Pgpool-II parent process received SIGUSR1
2024-07-03 14:31:19.097: main pid 8617: LOG: Pgpool-II parent process has received failover request
2024-07-03 14:31:19.098: watchdog pid 8620: LOG: received the failover indication from Pgpool-II on IPC interface
2024-07-03 14:31:19.098: watchdog pid 8620: LOG: watchdog is informed of failover start by the main process
2024-07-03 14:31:19.098: main pid 8617: LOG: === Starting quarantine. shutdown host staging-ha0002(5432) ===
2024-07-03 14:31:19.098: main pid 8617: LOG: Do not restart children because we are switching over node id 1 host: staging-ha0002 port: 5432 and we are in streaming replication mode
2024-07-03 14:31:19.098: main pid 8617: LOG: failover: set new primary node: 0
2024-07-03 14:31:19.098: main pid 8617: LOG: failover: set new main node: 0
2024-07-03 14:31:19.098: sr_check_worker pid 65605: ERROR: Failed to check replication time lag
2024-07-03 14:31:19.098: sr_check_worker pid 65605: DETAIL: No persistent db connection for the node 1
2024-07-03 14:31:19.098: sr_check_worker pid 65605: HINT: check sr_check_user and sr_check_password
2024-07-03 14:31:19.098: sr_check_worker pid 65605: CONTEXT: while checking replication time lag
2024-07-03 14:31:19.098: sr_check_worker pid 65605: LOG: worker process received restart request
2024-07-03 14:31:19.098: watchdog pid 8620: LOG: received the failover indication from Pgpool-II on IPC interface
2024-07-03 14:31:19.098: watchdog pid 8620: LOG: watchdog is informed of failover end by the main process
2024-07-03 14:31:19.098: main pid 8617: LOG: === Quarantine done. shutdown host staging-ha0002(5432) ==
2024-07-03 14:35:59.420: watchdog pid 8620: LOG: new outbound connection to staging-ha0002:9000
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: "staging-ha0001:9999 Linux staging-ha0001" is the coordinator as per our record but "staging-ha0002:9999 Linux staging-ha0002" is also announcing as a coordinator
2024-07-03 14:35:59.423: watchdog pid 8620: DETAIL: cluster is in the split-brain
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: I am the coordinator but "staging-ha0002:9999 Linux staging-ha0002" is also announcing as a coordinator
2024-07-03 14:35:59.423: watchdog pid 8620: DETAIL: trying to figure out the best contender for the leader/coordinator node
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: remote node:"staging-ha0002:9999 Linux staging-ha0002" should step down from leader because we are the older leader
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: We are in split brain, and I am the best candidate for leader/coordinator
2024-07-03 14:35:59.423: watchdog pid 8620: DETAIL: asking the remote node "staging-ha0002:9999 Linux staging-ha0002" to step down
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: we have received the NODE INFO message from the node:"staging-ha0002:9999 Linux staging-ha0002" that was lost
2024-07-03 14:35:59.423: watchdog pid 8620: DETAIL: we had lost this node because of "REPORTED BY LIFECHECK"
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: node:"staging-ha0002:9999 Linux staging-ha0002" was reported lost by the life-check process
2024-07-03 14:35:59.423: watchdog pid 8620: DETAIL: node will be added to cluster once life-check mark it as reachable again
2024-07-03 14:35:59.423: watchdog pid 8620: LOG: "staging-ha0001:9999 Linux staging-ha0001" is the coordinator as per our record but "staging-ha0002:9999 Linux staging-ha0002" is also announcing as a coordinator
2024-07-03 14:35:59.423: watchdog pid 8620: DETAIL: cluster is in the split-brain
2024-07-03 14:35:59.424: watchdog pid 8620: LOG: I am the coordinator but "staging-ha0002:9999 Linux staging-ha0002" is also announcing as a coordinator
2024-07-03 14:35:59.424: watchdog pid 8620: DETAIL: trying to figure out the best contender for the leader/coordinator node
2024-07-03 14:35:59.424: watchdog pid 8620: LOG: remote node:"staging-ha0002:9999 Linux staging-ha0002" should step down from leader because we are the older leader
2024-07-03 14:35:59.424: watchdog pid 8620: LOG: We are in split brain, and I am the best candidate for leader/coordinator
2024-07-03 14:35:59.424: watchdog pid 8620: DETAIL: asking the remote node "staging-ha0002:9999 Linux staging-ha0002" to step down
2024-07-03 14:35:59.424: watchdog pid 8620: LOG: we have received the NODE INFO message from the node:"staging-ha0002:9999 Linux staging-ha0002" that was lost
2024-07-03 14:35:59.424: watchdog pid 8620: DETAIL: we had lost this node because of "REPORTED BY LIFECHECK"
2024-07-03 14:35:59.424: watchdog pid 8620: LOG: node:"staging-ha0002:9999 Linux staging-ha0002" was reported lost by the life-check process
2024-07-03 14:35:59.424: watchdog pid 8620: DETAIL: node will be added to cluster once life-check mark it as reachable again
2024-07-03 14:35:59.424: watchdog pid 8620: LOG: remote node "staging-ha0002:9999 Linux staging-ha0002" is reporting that it has found us again
2024-07-03 14:35:59.425: watchdog pid 8620: LOG: leader/coordinator node "staging-ha0002:9999 Linux staging-ha0002" decided to resign from leader, probably because of split-brain
2024-07-03 14:35:59.425: watchdog pid 8620: DETAIL: It was not our coordinator/leader anyway. ignoring the message
2024-07-03 14:35:59.425: watchdog pid 8620: LOG: we have received the NODE INFO message from the node:"staging-ha0002:9999 Linux staging-ha0002" that was lost
2024-07-03 14:35:59.425: watchdog pid 8620: DETAIL: we had lost this node because of "REPORTED BY LIFECHECK"
2024-07-03 14:35:59.425: watchdog pid 8620: LOG: node:"staging-ha0002:9999 Linux staging-ha0002" was reported lost by the life-check process
2024-07-03 14:35:59.425: watchdog pid 8620: DETAIL: node will be added to cluster once life-check mark it as reachable again
2024-07-03 14:35:59.425: watchdog pid 8620: LOG: we have received the NODE INFO message from the node:"staging-ha0002:9999 Linux staging-ha0002" that was lost
2024-07-03 14:35:59.425: watchdog pid 8620: DETAIL: we had lost this node because of "REPORTED BY LIFECHECK"
2024-07-03 14:35:59.425: watchdog pid 8620: LOG: node:"staging-ha0002:9999 Linux staging-ha0002" was reported lost by the life-check process
2024-07-03 14:35:59.425: watchdog pid 8620: DETAIL: node will be added to cluster once life-check mark it as reachable again
2024-07-03 14:35:59.427: watchdog pid 8620: LOG: we have received the NODE INFO message from the node:"staging-ha0002:9999 Linux staging-ha0002" that was lost
2024-07-03 14:35:59.427: watchdog pid 8620: DETAIL: we had lost this node because of "REPORTED BY LIFECHECK"
2024-07-03 14:35:59.427: watchdog pid 8620: LOG: node:"staging-ha0002:9999 Linux staging-ha0002" was reported lost by the life-check process
2024-07-03 14:35:59.427: watchdog pid 8620: DETAIL: node will be added to cluster once life-check mark it as reachable again
2024-07-03 14:35:59.427: watchdog pid 8620: LOG: we have received the NODE INFO message from the node:"staging-ha0002:9999 Linux staging-ha0002" that was lost
2024-07-03 14:35:59.427: watchdog pid 8620: DETAIL: we had lost this node because of "REPORTED BY LIFECHECK"
2024-07-03 14:35:59.427: watchdog pid 8620: LOG: node:"staging-ha0002:9999 Linux staging-ha0002" was reported lost by the life-check process
2024-07-03 14:35:59.427: watchdog pid 8620: DETAIL: node will be added to cluster once life-check mark it as reachable again
2024-07-03 14:36:00.213: health_check1 pid 8676: LOG: failed to connect to PostgreSQL server on "staging-ha0002:5432", timed out
2024-07-03 14:36:00.213: health_check1 pid 8676: LOG: health check retrying on DB node: 1 (round:3)
2024-07-03 14:36:01.221: health_check1 pid 8676: LOG: health check retrying on DB node: 1 succeeded
2024-07-03 14:36:01.221: health_check1 pid 8676: LOG: received failback request for node_id: 1 from pid [8676]
2024-07-03 14:36:01.221: health_check1 pid 8676: LOG: failback request from pid [8676] is changed to update status request because node_id: 1 was quarantined
2024-07-03 14:36:01.221: health_check1 pid 8676: LOG: signal_user1_to_parent_with_reason(0)
2024-07-03 14:36:01.221: main pid 8617: LOG: Pgpool-II parent process received SIGUSR1
2024-07-03 14:36:01.221: main pid 8617: LOG: Pgpool-II parent process has received failover request
2024-07-03 14:36:01.221: watchdog pid 8620: LOG: received the failover indication from Pgpool-II on IPC interface
2024-07-03 14:36:01.221: watchdog pid 8620: LOG: watchdog is informed of failover start by the main process
2024-07-03 14:36:01.221: watchdog pid 8620: LOG: watchdog is informed of failover start by the main process
2024-07-03 14:36:01.222: main pid 8617: LOG: === Starting fail back. reconnect host staging-ha0002(5432) ===
2024-07-03 14:36:01.222: main pid 8617: LOG: Node 0 is not down (status: 2)
2024-07-03 14:36:01.222: main pid 8617: LOG: Do not restart children because we are failing back node id 1 host: staging-ha0002 port: 5432 and we are in streaming replication mode and not all backends were down
2024-07-03 14:36:01.222: main pid 8617: LOG: failover: set new primary node: 0
2024-07-03 14:36:01.222: main pid 8617: LOG: failover: set new main node: 0
2024-07-03 14:36:01.222: sr_check_worker pid 66222: LOG: worker process received restart request
2024-07-03 14:36:01.222: watchdog pid 8620: LOG: received the failover indication from Pgpool-II on IPC interface
2024-07-03 14:36:01.222: watchdog pid 8620: LOG: watchdog is informed of failover end by the main process
2024-07-03 14:36:01.222: main pid 8617: LOG: === Failback done. reconnect host staging-ha0002(5432) ===
Attachment
pgsql-general by date: