received immediate shutdown request caused cluster failover - Mailing list pgsql-general

From Yi Sun
Subject received immediate shutdown request caused cluster failover
Date
Msg-id CABWY_HBb3iRxYdLD2zEeGv4RMKBMP1m=3Yz8qFcKcvm93thScg@mail.gmail.com
Whole thread Raw
Responses Re: received immediate shutdown request caused cluster failover  (Yi Sun <yinan81@gmail.com>)
List pgsql-general
Hi all

There are 3 nodes in our prd db in patroni cluster, vm01 is leader, vm02 and vm03 are standby, vm01 received immediate shutdown request  caused failover to vm02, after that vm03 received fast shutdown request 

As vm03 not in cluster so have to reinit vm03

What's the possible root caused vm01 received immediate shutdown request  please?

and What's the possible root caused vm03 received fast shutdown request

Below are detail info, please check, if need any other info, I will provide, thanks

postgresql version: 11.4
OS: centos 7
HA: patroni upgraded from 1.6.3 to 2.0.1 these days

--vm01 pg log

[2020-11-16 12:03:56.592 UTC] p=31485:3@ c=pgwatch2@127.0.0.1/eu4_baas_bckp_epmgr:pgwatch2 LOG:  disconnection: session time: 0:00:00.037 user=pgwatch2 database=eu4_baas_bckp_epmgr host=127.0.0.1 port=36916
[2020-11-16 12:03:57.005 UTC] p=32103:14@ c=@/: LOG:  received immediate shutdown request
[2020-11-16 12:03:57.017 UTC] p=31189:3@314/0 c=atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] WARNING:  terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31189:4@314/0 c=atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31189:5@314/0 c=atp_si_user@10.253.85.203/eu4_baas_software_inventory:[unknown] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.017 UTC] p=31183:3@313/0 c=atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] WARNING:  terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31183:4@313/0 c=atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31183:5@313/0 c=atp_pm@10.253.86.62/eu4_baas_atp_pm:[unknown] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.017 UTC] p=31182:3@310/281059 c=bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] WARNING:  terminating connection because of crash of another server process
[2020-11-16 12:03:57.017 UTC] p=31182:4@310/281059 c=bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[2020-11-16 12:03:57.017 UTC] p=31182:5@310/281059 c=bckp_epmgr@10.253.85.202/eu4_baas_bckp_epmgr:[unknown] HINT:  In a moment you should be able to reconnect to the database and repeat your command.

--vm01 patroni log
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:56,922", "name": "patroni.dcs.consul", "process": 32085, "thread": 140350381303616, "level": "WARNING", "message": "Could not register service: unknown role type promoted"}
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:56 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:56,923", "name": "patroni.ha", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "does not have lock"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,158", "name": "patroni.dcs.consul", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "Deregister service pgcluster11/eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,180", "name": "patroni", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "demoting self because i do not have the lock and i was a leader"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,181", "name": "patroni", "process": 32085, "thread": 140350381303616, "level": "WARNING", "message": "Loop time exceeded, rescheduling immediately."}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,184", "name": "patroni.ha", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm01"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,185", "name": "patroni", "process": 32085, "thread": 140350381303616, "level": "INFO", "message": "starting after demotion in progress"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,186", "name": "patroni.postgresql.connection", "process": 32085, "thread": 140349984724736, "level": "INFO", "message": "closed patroni connection to the postgresql cluster"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: {"time": "2020-11-16 12:03:59,668", "name": "patroni.postgresql.postmaster", "process": 32085, "thread": 140349984724736, "level": "INFO", "message": "postmaster pid=31495"}
Nov 16 12:03:59 eu4-baas-patroni-cluster-vm01 patroni[32085]: localhost:5432 - no response

--vm03 pg log
[2020-11-16 12:03:53.484 UTC] p=24090:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=39442
[2020-11-16 12:03:53.485 UTC] p=24090:2@9/1878954 c=replicator@::1/[unknown]:[unknown] LOG:  replication connection authorized: user=replicator
[2020-11-16 12:03:53.487 UTC] p=24090:3@ c=replicator@::1/[unknown]:[unknown] LOG:  disconnection: session time: 0:00:00.002 user=replicator database= host=::1 port=39442
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
[2020-11-16 12:03:57.108 UTC] p=27711:6@ c=@/: FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[2020-11-16 12:03:57.144 UTC] p=27701:7@1/0 c=@/: LOG:  invalid record length at E052/93EED170: wanted 24, got 0
[2020-11-16 12:03:57.419 UTC] p=24097:1@ c=@/: FATAL:  could not connect to the primary server: FATAL:  the database system is shutting down

[2020-11-16 12:05:38.581 UTC] p=25380:2@ c=postgres@::1/postgres:[unknown] FATAL:  the database system is starting up
[2020-11-16 12:05:41.010 UTC] p=24113:4@1/0 c=@/: LOG:  consistent recovery state reached at E052/93EED170
[2020-11-16 12:05:41.010 UTC] p=24113:5@1/0 c=@/: LOG:  invalid record length at E052/93EED170: wanted 24, got 0
[2020-11-16 12:05:41.011 UTC] p=24109:8@ c=@/: LOG:  database system is ready to accept read only connections
[2020-11-16 12:05:43.280 UTC] p=25400:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40922
[2020-11-16 12:05:43.392 UTC] p=25401:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40924
[2020-11-16 12:05:43.393 UTC] p=25401:2@2/2 c=postgres@::1/postgres:[unknown] LOG:  connection authorized: user=postgres database=postgres
[2020-11-16 12:05:43.580 UTC] p=25402:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40926
[2020-11-16 12:05:43.628 UTC] p=25402:2@3/1 c=replicator@::1/[unknown]:[unknown] LOG:  replication connection authorized: user=replicator
[2020-11-16 12:05:43.630 UTC] p=25402:3@ c=replicator@::1/[unknown]:[unknown] LOG:  disconnection: session time: 0:00:00.049 user=replicator database= host=::1 port=40926
[2020-11-16 12:05:43.647 UTC] p=25406:1@ c=[unknown]@::1/[unknown]:[unknown] LOG:  connection received: host=::1 port=40932
[2020-11-16 12:05:43.648 UTC] p=25406:2@3/2 c=replicator@::1/[unknown]:[unknown] LOG:  replication connection authorized: user=replicator
[2020-11-16 12:05:43.650 UTC] p=25406:3@ c=replicator@::1/[unknown]:[unknown] LOG:  disconnection: session time: 0:00:00.002 user=replicator database= host=::1 port=40932
[2020-11-16 12:05:44.006 UTC] p=24109:9@ c=@/: LOG:  received fast shutdown request
[2020-11-16 12:05:44.015 UTC] p=24109:10@ c=@/: LOG:  aborting any active transactions
[2020-11-16 12:05:44.015 UTC] p=25401:3@2/0 c=postgres@::1/postgres:Patroni FATAL:  terminating connection due to administrator command
[2020-11-16 12:05:44.015 UTC] p=25401:4@ c=postgres@::1/postgres:Patroni LOG:  disconnection: session time: 0:00:00.622 user=postgres database=postgres host=::1 port=40924
[2020-11-16 12:05:44.509 UTC] p=24120:1@ c=@/: LOG:  shutting down
[2020-11-16 12:05:45.621 UTC] p=24109:11@ c=@/: LOG:  database system is shut down

--vm03 patroni log
Nov 16 12:03:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:03:51,436", "name": "patroni", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "following a different leader because i am not the healthiest node"}

Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: pg_controldata: could not read file "/var/lib/pgsql/11/data/global/pg_control": read 0 of 288
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,016", "name": "patroni.postgresql", "process": 27664, "thread": 140371489281856, "level": "ERROR", "message": "Error when calling pg_controldata"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: Traceback (most recent call last):
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/__init__.py", line 718, in controldata
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: data = subprocess.check_output([self.pgcommand('pg_controldata'), self._data_dir], env=env)
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: **kwargs).stdout
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: File "/usr/lib64/python3.6/subprocess.py", line 438, in run
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: output=stdout, stderr=stderr)
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: subprocess.CalledProcessError: Command '['/usr/pgsql-11/bin/pg_controldata', '/var/lib/pgsql/11/data']' returned non-zero exit status 1.
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,018", "name": "patroni.ha", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "pg_controldata:
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: "}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,022", "name": "patroni.ha", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "Lock owner: eu4-baas-patroni-cluster-vm02; I am eu4-baas-patroni-cluster-vm03"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,031", "name": "patroni.dcs.consul", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "Deregister service pgcluster11/eu4-baas-patroni-cluster-vm03"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,048", "name": "patroni", "process": 27664, "thread": 140371489281856, "level": "INFO", "message": "starting as a secondary"}
Nov 16 12:05:51 eu4-baas-patroni-cluster-vm03 patroni[27664]: {"time": "2020-11-16 12:05:51,049", "name": "patroni.postgresql.connection", "process": 27664, "thread": 140371151345408, "level": "INFO", "message": "closed patroni connection to the postgresql cluster"}


Best regards
Oliver Sun







pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: create type with %type or %rowtype
Next
From: Marcin Giedz
Date:
Subject: Re: pg_upgrade from 12 to 13 failes with plpython2