Postmaster fails to shut down right after crash restart - Mailing list pgsql-hackers

From Sergey Shinderuk
Subject Postmaster fails to shut down right after crash restart
Date
Msg-id 63dcad16-22de-4326-a395-5310bc7e05ff@postgrespro.ru
Whole thread Raw
List pgsql-hackers
Hello,

While developing a patch and running regression tests I noticed that the 
postmaster could fail to shut down right after crash restart. It could 
get stuck in the PM_WAIT_BACKENDS state forever.

As far as I understand, the problem occurs when a shutdown signal is 
received before getting PMSIGNAL_RECOVERY_STARTED from the startup 
process. In that case the FatalError flag is not cleared, and the 
postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer, 
which ignores SIGTERM.

To easily reproduce the problem I added pg_usleep in xlogrecovery.c just 
before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch 
attached.

Then I run a script that simulates a crash and does pg_ctl stop:

$ ./init.sh
[...]

$ ./stop-after-crash.sh
waiting for server to start.... done
server started
waiting for server to shut 
down............................................................... failed
pg_ctl: server does not shut down


Some processes are still alive:

$ ps uf -C postgres
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
sergey    279874  0.0  0.0 222816 28560 ?        Ss   14:25   0:00 
/home/sergey/pgwork/devel/install/bin/postgres -D data
sergey    279887  0.0  0.0 222772  5664 ?        Ss   14:25   0:00  \_ 
postgres: io worker 0
sergey    279888  0.0  0.0 222772  5664 ?        Ss   14:25   0:00  \_ 
postgres: io worker 1
sergey    279889  0.0  0.0 222772  5664 ?        Ss   14:25   0:00  \_ 
postgres: io worker 2
sergey    279891  0.0  0.0 222884  8480 ?        Ss   14:25   0:00  \_ 
postgres: checkpointer


Here is an excerpt from the debug log:

postmaster[279874] LOG:  all server processes terminated; reinitializing
startup[279890] LOG:  database system was interrupted; last known up at 
2025-04-24 14:25:58 MSK
startup[279890] LOG:  database system was not properly shut down; 
automatic recovery in progress

postmaster[279874] DEBUG:  postmaster received shutdown request signal
postmaster[279874] LOG:  received fast shutdown request
postmaster[279874] DEBUG:  updating PMState from PM_STARTUP to 
PM_STOP_BACKENDS
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to background 
writer process with pid 279892
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to checkpointer 
process with pid 279891
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to startup process 
with pid 279890
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279889
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279888
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279887
postmaster[279874] DEBUG:  updating PMState from PM_STOP_BACKENDS to 
PM_WAIT_BACKENDS

startup[279890] LOG:  invalid record length at 0/175A4D8: expected at 
least 24, got 0
postmaster[279874] DEBUG:  postmaster received pmsignal signal
startup[279890] LOG:  redo is not required

checkpointer[279891] LOG:  checkpoint starting: end-of-recovery 
immediate wait
checkpointer[279891] LOG:  checkpoint complete: wrote 0 buffers (0.0%), 
wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; 
write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001 
s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo 
lsn=0/175A4D8

startup[279890] DEBUG:  exit(0)
postmaster[279874] DEBUG:  updating PMState from PM_WAIT_BACKENDS to 
PM_WAIT_BACKENDS

checkpointer[279891] DEBUG:  checkpoint skipped because system is idle
checkpointer[279891] DEBUG:  checkpoint skipped because system is idle


I don't know how to fix this, but thought it's worth reporting.

Best regards,

-- 
Sergey Shinderuk        https://postgrespro.com/
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: sslmode=secure by default (Re: Making sslrootcert=system work on Windows psql)
Next
From: Bruce Momjian
Date:
Subject: pg_upgrade-breaking release