Thread: BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup
BUG #15989: Cluster unable to open as hot standby after SIGKILL during exclusive backup
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 15989 Logged by: James Lucas Email address: jlucasdba@gmail.com PostgreSQL version: 11.5 Operating system: Centos 7 Description: After being sent SIGKILL while exclusive backup is in effect, cluster becomes unable to open as a hot standby. Reproducible on (at least) 11.5 and 9.6.15. Steps to reproduce on a fresh database cluster: * Initialize new cluster. initdb -D data * Start cluster. pg_ctl -D data start * Verify hot_standby is enabled (Should return "on" on 11.5. On 9.6.15, need to enable it.) psql -c 'show hot_standby' * Verify wal_level is set to replica or logical (Okay on 11.5. On 9.6.15 need to set and restart.) psql -c 'show wal_level' * Stop cluster. pg_ctl -D data stop * Add recovery.conf echo 'standby_mode = true' >> data/recovery.conf * Start as hot standby pg_ctl -D data start * Validate cluster is running as hot standby (should return true) psql -c 'select pg_is_in_recovery()' * Stop cluster pg_ctl -D data stop * Remove recovery.conf rm data/recovery.conf * Start cluster normally pg_ctl -D data start * Enable exclusive backup psql -c "select pg_start_backup('')" * Find pid of main postgres process ps -ef | grep 'postgres -D' * Send SIGKILL to found pid kill -s KILL <pid> * Add recovery.conf echo 'standby_mode = true' >> data/recovery.conf * Attempt to start cluster pg_ctl -D data start At this point, the cluster fails to open. On 11.5 pg_ctl hangs waiting for the database to open, and eventually times out. On 9.6.15, pg_ctl runs normally, but tailing the database log shows that it never opens. It loops at "starting up." I've found that if you stop the instance and remove the recovery.conf, the database actually will open normally. But even after that, if you go back and try to open as a hot standby it will fail to open again. I have so far not been able to find a way to let this cluster open as a hot standby again. Affected instances had to be restored from backup. This is particularly a problem when running postgres in Docker, as Docker will send SIGKILL if database shutdown takes more than a few seconds. Please let me know if any questions. Thanks, James Lucas
Re: BUG #15989: Cluster unable to open as hot standby after SIGKILLduring exclusive backup
From
Stephen Frost
Date:
Greetings, * PG Bug reporting form (noreply@postgresql.org) wrote: > * Enable exclusive backup > psql -c "select pg_start_backup('')" > > * Find pid of main postgres process > ps -ef | grep 'postgres -D' > > * Send SIGKILL to found pid > kill -s KILL <pid> Don't kill the postmaster and don't use exclusive backup (which has been deprecated, due specifically in part to the issue that it causes problems on a crash). > This is particularly a problem when running postgres in Docker, as Docker > will send SIGKILL if database shutdown takes more than a few seconds. You'll want to fix that then. Thanks, Stephen