Thread: Postgres Crash Issue

Postgres Crash Issue

From
"Sankar, Uma (Uma)"
Date:

Hi All,

 

We have a database crash issue last Friday and it’s a patroni HA-based Postgres database running,  we have checked the pg logs and it shows error as shared memory corruptions. Can someone please check the attached logs and share your suggestions and feedback on the crash issues. Thanks in advance.

 

 

Regards,

Uma Sankar

 

Attachment

Re: Postgres Crash Issue

From
hubert depesz lubaczewski
Date:
On Tue, Mar 15, 2022 at 07:20:57AM +0000, Sankar, Uma (Uma) wrote:
> Hi All,
> 
> We have a database crash issue last Friday and it's a patroni HA-based
> Postgres database running,  we have checked the pg logs and it shows
> error as shared memory corruptions. Can someone please check the
> attached logs and share your suggestions and feedback on the crash
> issues. Thanks in advance.

It wasn't crash.

Logs show:

#v+
2022-03-11 08:58:42.956 UTC [27799] LOG:  received immediate shutdown request
2022-03-11 08:58:42.956 UTC [17115] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.956 UTC [17115] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.957 UTC [14618] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [14618] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.957 UTC [14618] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.957 UTC [16396] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [16396] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.957 UTC [16396] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.957 UTC [12873] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [12873] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.957 UTC [12873] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.958 UTC [16564] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.958 UTC [16564] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.958 UTC [16564] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
#v-

and

#v+
2022-03-11 08:58:45.969 UTC [17354] LOG:  consistent recovery state reached at A/E1836620
2022-03-11 08:58:45.969 UTC [17347] LOG:  database system is ready to accept read only connections
2022-03-11 08:59:01.234 UTC [17347] LOG:  received fast shutdown request
2022-03-11 08:59:01.234 UTC [17347] LOG:  aborting any active transactions
2022-03-11 08:59:01.234 UTC [17412] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17415] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17393] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17399] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17408] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17394] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17389] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17400] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17378] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.238 UTC [17362] LOG:  shutting down
2022-03-11 08:59:01.254 UTC [17347] LOG:  database system is shut down
2022-03-11 08:59:01.419 UTC [17444] LOG:  database system was shut down in recovery at 2022-03-11 08:59:01 UTC
2022-03-11 08:59:01.419 UTC [17444] LOG:  entering standby mode
2022-03-11 08:59:01.420 UTC [17444] LOG:  redo starts at A/E17CB690
#v-

Something/someone stopped the pg, twice, ~ 20 seconds apart.
What/how/why - can't tell, but "received ... shutdown request" is not
crash. It is someone/something doing "pg_ctl stop"

depesz




RE: [External]Re: Postgres Crash Issue

From
"Menon, Deepak (Deepak)"
Date:
Hi Depesz,

Then what does this message mean ?

2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory. 
 


Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond@avaya.com
Leave Alert : 

-----Original Message-----
From: depesz@depesz.com <depesz@depesz.com> 
Sent: Tuesday, March 15, 2022 6:58 PM
To: Sankar, Uma (Uma) **CTR** <usankar@avaya.com>
Cc: pgsql-general@postgresql.org; Menon, Deepak (Deepak) <menond@avaya.com>
Subject: [External]Re: Postgres Crash Issue

[External Sender]

On Tue, Mar 15, 2022 at 07:20:57AM +0000, Sankar, Uma (Uma) wrote:
> Hi All,
>
> We have a database crash issue last Friday and it's a patroni HA-based 
> Postgres database running,  we have checked the pg logs and it shows 
> error as shared memory corruptions. Can someone please check the 
> attached logs and share your suggestions and feedback on the crash 
> issues. Thanks in advance.

It wasn't crash.

Logs show:

#v+
2022-03-11 08:58:42.956 UTC [27799] LOG:  received immediate shutdown request
2022-03-11 08:58:42.956 UTC [17115] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.956 UTC [17115] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.957 UTC [14618] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [14618] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.957 UTC [14618] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.957 UTC [16396] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [16396] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.957 UTC [16396] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.957 UTC [12873] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.957 UTC [12873] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.957 UTC [12873] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
2022-03-11 08:58:42.958 UTC [16564] WARNING:  terminating connection because of crash of another server process
2022-03-11 08:58:42.958 UTC [16564] DETAIL:  The postmaster has commanded this server process to roll back the current
transactionand exit, because another server process exited abnormally and possibly corrupted shared memory.
 
2022-03-11 08:58:42.958 UTC [16564] HINT:  In a moment you should be able to reconnect to the database and repeat your
command.
#v-

and

#v+
2022-03-11 08:58:45.969 UTC [17354] LOG:  consistent recovery state reached at A/E1836620
2022-03-11 08:58:45.969 UTC [17347] LOG:  database system is ready to accept read only connections
2022-03-11 08:59:01.234 UTC [17347] LOG:  received fast shutdown request
2022-03-11 08:59:01.234 UTC [17347] LOG:  aborting any active transactions
2022-03-11 08:59:01.234 UTC [17412] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17415] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.234 UTC [17393] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17399] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.235 UTC [17408] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17394] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.236 UTC [17389] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17400] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.237 UTC [17378] FATAL:  terminating connection due to administrator command
2022-03-11 08:59:01.238 UTC [17362] LOG:  shutting down
2022-03-11 08:59:01.254 UTC [17347] LOG:  database system is shut down
2022-03-11 08:59:01.419 UTC [17444] LOG:  database system was shut down in recovery at 2022-03-11 08:59:01 UTC
2022-03-11 08:59:01.419 UTC [17444] LOG:  entering standby mode
2022-03-11 08:59:01.420 UTC [17444] LOG:  redo starts at A/E17CB690
#v-

Something/someone stopped the pg, twice, ~ 20 seconds apart.
What/how/why - can't tell, but "received ... shutdown request" is not crash. It is someone/something doing "pg_ctl
stop"

depesz


Re: [External]Re: Postgres Crash Issue

From
hubert depesz lubaczewski
Date:
On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote:
> Hi Depesz,
> 
> Then what does this message mean ?
> 
> 2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 
 


When someone issues pg_ctl -m immediate stop (which is the immediate
shutdown). pg basically shutd backends in their head. This, in turn,
means that all other have to exit to avoid shared memory corruption.

Sometimes such exit happens before pg_ctl will kill next backend,
sometimes not. So you sometimes can get such messages.

depesz



RE: [External]Re: Postgres Crash Issue

From
"Menon, Deepak (Deepak)"
Date:
Thanks Depesz. Is there anyway to check the source of the command as this HA setup is managed by using Patroni 

Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond@avaya.com
Leave Alert : 

-----Original Message-----
From: depesz@depesz.com <depesz@depesz.com> 
Sent: Tuesday, March 15, 2022 7:02 PM
To: Menon, Deepak (Deepak) <menond@avaya.com>
Cc: Sankar, Uma (Uma) **CTR** <usankar@avaya.com>; pgsql-general@postgresql.org
Subject: Re: [External]Re: Postgres Crash Issue

On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote:
> Hi Depesz,
> 
> Then what does this message mean ?
> 
> 2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 
 


When someone issues pg_ctl -m immediate stop (which is the immediate shutdown). pg basically shutd backends in their
head.This, in turn, means that all other have to exit to avoid shared memory corruption.
 

Sometimes such exit happens before pg_ctl will kill next backend, sometimes not. So you sometimes can get such
messages.

depesz

Re: [External]Re: Postgres Crash Issue

From
hubert depesz lubaczewski
Date:
On Tue, Mar 15, 2022 at 01:38:04PM +0000, Menon, Deepak (Deepak) wrote:
> Thanks Depesz. Is there anyway to check the source of the command as this HA setup is managed by using Patroni 

From pg perspective it got immediate stop request. There is no way to
tell why/how.

depesz




Re: [External]Re: Postgres Crash Issue

From
Tom Lane
Date:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> On Tue, Mar 15, 2022 at 01:30:25PM +0000, Menon, Deepak (Deepak) wrote:
>> Then what does this message mean ?
>>
>> 2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory.  

> When someone issues pg_ctl -m immediate stop (which is the immediate
> shutdown). pg basically shutd backends in their head. This, in turn,
> means that all other have to exit to avoid shared memory corruption.

FWIW, v14 and later produce more on-point error reporting in this
case.  The code path used to be the same as for a backend crash,
but now clients see something like

WARNING:  terminating connection due to immediate shutdown command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

and the server log isn't spammed with copies of that, it just
looks like

2022-03-15 10:07:48.241 EDT [315456] LOG:  received immediate shutdown request
2022-03-15 10:07:48.244 EDT [315456] LOG:  database system is shut down

None of this gets us any closer to understanding where the signal
came from, unfortunately.

            regards, tom lane



Re: [External]Re: Postgres Crash Issue

From
Adrian Klaver
Date:
On 3/15/22 6:30 AM, Menon, Deepak (Deepak) wrote:
> Hi Depesz,
> 
> Then what does this message mean ?

 From here, assuming someone/thing did pg_ctl -m immediate stop:

https://www.postgresql.org/docs/current/app-pg-ctl.html

"“Immediate” mode will abort all server processes immediately, without a 
clean shutdown. This choice will lead to a crash-recovery cycle during 
the next server start."


> 
> 2022-03-11 08:58:42.956 UTC [17115] DETAIL:  The postmaster has commanded this server process to roll back the
currenttransaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
 
> 
> 
> Deepak Menon| Avaya Managed Services-Delivery|+91 9899012875| menond@avaya.com
> Leave Alert :



-- 
Adrian Klaver
adrian.klaver@aklaver.com