Thread: Postgres Crash Issue
Hi All,
We have a database crash issue last Friday and it’s a patroni HA-based Postgres database running, we have checked the pg logs and it shows error as shared memory corruptions. Can someone please check the attached logs and share your suggestions and feedback on the crash issues. Thanks in advance.
Regards,
Uma Sankar
Attachment
On Tue, Mar 15, 2022 at 07:20:57AM +0000, Sankar, Uma (Uma) wrote: > Hi All, > > We have a database crash issue last Friday and it's a patroni HA-based > Postgres database running, we have checked the pg logs and it shows > error as shared memory corruptions. Can someone please check the > attached logs and share your suggestions and feedback on the crash > issues. Thanks in advance. It wasn't crash. Logs show: #v+ 2022-03-11 08:58:42.956 UTC [27799] LOG: received immediate shutdown request 2022-03-11 08:58:42.956 UTC [17115] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.956 UTC [17115] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.957 UTC [14618] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.957 UTC [14618] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.957 UTC [14618] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.957 UTC [16396] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.957 UTC [16396] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.957 UTC [16396] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.957 UTC [12873] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.957 UTC [12873] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.957 UTC [12873] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.958 UTC [16564] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.958 UTC [16564] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.958 UTC [16564] HINT: In a moment you should be able to reconnect to the database and repeat your command. #v- and #v+ 2022-03-11 08:58:45.969 UTC [17354] LOG: consistent recovery state reached at A/E1836620 2022-03-11 08:58:45.969 UTC [17347] LOG: database system is ready to accept read only connections 2022-03-11 08:59:01.234 UTC [17347] LOG: received fast shutdown request 2022-03-11 08:59:01.234 UTC [17347] LOG: aborting any active transactions 2022-03-11 08:59:01.234 UTC [17412] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.234 UTC [17415] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.234 UTC [17393] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.235 UTC [17399] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.235 UTC [17408] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.236 UTC [17394] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.236 UTC [17389] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.237 UTC [17400] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.237 UTC [17378] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.238 UTC [17362] LOG: shutting down 2022-03-11 08:59:01.254 UTC [17347] LOG: database system is shut down 2022-03-11 08:59:01.419 UTC [17444] LOG: database system was shut down in recovery at 2022-03-11 08:59:01 UTC 2022-03-11 08:59:01.419 UTC [17444] LOG: entering standby mode 2022-03-11 08:59:01.420 UTC [17444] LOG: redo starts at A/E17CB690 #v- Something/someone stopped the pg, twice, ~ 20 seconds apart. What/how/why - can't tell, but "received ... shutdown request" is not crash. It is someone/something doing "pg_ctl stop" depesz
Hi Depesz, Then what does this message mean ? 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond@avaya.com Leave Alert : -----Original Message----- From: depesz@depesz.com <depesz@depesz.com> Sent: Tuesday, March 15, 2022 6:58 PM To: Sankar, Uma (Uma) **CTR** <usankar@avaya.com> Cc: pgsql-general@postgresql.org; Menon, Deepak (Deepak) <menond@avaya.com> Subject: [External]Re: Postgres Crash Issue [External Sender] On Tue, Mar 15, 2022 at 07:20:57AM +0000, Sankar, Uma (Uma) wrote: > Hi All, > > We have a database crash issue last Friday and it's a patroni HA-based > Postgres database running, we have checked the pg logs and it shows > error as shared memory corruptions. Can someone please check the > attached logs and share your suggestions and feedback on the crash > issues. Thanks in advance. It wasn't crash. Logs show: #v+ 2022-03-11 08:58:42.956 UTC [27799] LOG: received immediate shutdown request 2022-03-11 08:58:42.956 UTC [17115] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.956 UTC [17115] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.957 UTC [14618] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.957 UTC [14618] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.957 UTC [14618] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.957 UTC [16396] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.957 UTC [16396] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.957 UTC [16396] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.957 UTC [12873] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.957 UTC [12873] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.957 UTC [12873] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2022-03-11 08:58:42.958 UTC [16564] WARNING: terminating connection because of crash of another server process 2022-03-11 08:58:42.958 UTC [16564] DETAIL: The postmaster has commanded this server process to roll back the current transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 2022-03-11 08:58:42.958 UTC [16564] HINT: In a moment you should be able to reconnect to the database and repeat your command. #v- and #v+ 2022-03-11 08:58:45.969 UTC [17354] LOG: consistent recovery state reached at A/E1836620 2022-03-11 08:58:45.969 UTC [17347] LOG: database system is ready to accept read only connections 2022-03-11 08:59:01.234 UTC [17347] LOG: received fast shutdown request 2022-03-11 08:59:01.234 UTC [17347] LOG: aborting any active transactions 2022-03-11 08:59:01.234 UTC [17412] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.234 UTC [17415] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.234 UTC [17393] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.235 UTC [17399] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.235 UTC [17408] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.236 UTC [17394] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.236 UTC [17389] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.237 UTC [17400] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.237 UTC [17378] FATAL: terminating connection due to administrator command 2022-03-11 08:59:01.238 UTC [17362] LOG: shutting down 2022-03-11 08:59:01.254 UTC [17347] LOG: database system is shut down 2022-03-11 08:59:01.419 UTC [17444] LOG: database system was shut down in recovery at 2022-03-11 08:59:01 UTC 2022-03-11 08:59:01.419 UTC [17444] LOG: entering standby mode 2022-03-11 08:59:01.420 UTC [17444] LOG: redo starts at A/E17CB690 #v- Something/someone stopped the pg, twice, ~ 20 seconds apart. What/how/why - can't tell, but "received ... shutdown request" is not crash. It is someone/something doing "pg_ctl stop" depesz
On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote: > Hi Depesz, > > Then what does this message mean ? > > 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. When someone issues pg_ctl -m immediate stop (which is the immediate shutdown). pg basically shutd backends in their head. This, in turn, means that all other have to exit to avoid shared memory corruption. Sometimes such exit happens before pg_ctl will kill next backend, sometimes not. So you sometimes can get such messages. depesz
Thanks Depesz. Is there anyway to check the source of the command as this HA setup is managed by using Patroni Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond@avaya.com Leave Alert : -----Original Message----- From: depesz@depesz.com <depesz@depesz.com> Sent: Tuesday, March 15, 2022 7:02 PM To: Menon, Deepak (Deepak) <menond@avaya.com> Cc: Sankar, Uma (Uma) **CTR** <usankar@avaya.com>; pgsql-general@postgresql.org Subject: Re: [External]Re: Postgres Crash Issue On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote: > Hi Depesz, > > Then what does this message mean ? > > 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. When someone issues pg_ctl -m immediate stop (which is the immediate shutdown). pg basically shutd backends in their head.This, in turn, means that all other have to exit to avoid shared memory corruption. Sometimes such exit happens before pg_ctl will kill next backend, sometimes not. So you sometimes can get such messages. depesz
On Tue, Mar 15, 2022 at 01:38:04PM +0000, Menon, Deepak (Deepak) wrote: > Thanks Depesz. Is there anyway to check the source of the command as this HA setup is managed by using Patroni From pg perspective it got immediate stop request. There is no way to tell why/how. depesz
hubert depesz lubaczewski <depesz@depesz.com> writes: > On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote: >> Then what does this message mean ? >> >> 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. > When someone issues pg_ctl -m immediate stop (which is the immediate > shutdown). pg basically shutd backends in their head. This, in turn, > means that all other have to exit to avoid shared memory corruption. FWIW, v14 and later produce more on-point error reporting in this case. The code path used to be the same as for a backend crash, but now clients see something like WARNING: terminating connection due to immediate shutdown command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. and the server log isn't spammed with copies of that, it just looks like 2022-03-15 10:07:48.241 EDT [315456] LOG: received immediate shutdown request 2022-03-15 10:07:48.244 EDT [315456] LOG: database system is shut down None of this gets us any closer to understanding where the signal came from, unfortunately. regards, tom lane
On 3/15/22 6:30 AM, Menon, Deepak (Deepak) wrote: > Hi Depesz, > > Then what does this message mean ? From here, assuming someone/thing did pg_ctl -m immediate stop: https://www.postgresql.org/docs/current/app-pg-ctl.html "“Immediate” mode will abort all server processes immediately, without a clean shutdown. This choice will lead to a crash-recovery cycle during the next server start." > > 2022-03-11 08:58:42.956 UTC [17115] DETAIL: The postmaster has commanded this server process to roll back the currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. > > > Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond@avaya.com > Leave Alert : -- Adrian Klaver adrian.klaver@aklaver.com