Thread: Triaging pg_ctl shutdown hang

Triaging pg_ctl shutdown hang

From
Joseph Hammerman
Date:
Hi pgsql-admins list,

We recently had an incident precipitated by postgres 9.6.22 shutdown -m fast hanging. There were two processes that were not quitting, the postmaster and the logger process. We had limited visibility into the underlying conditions since psql locks out new connections and kicks everyone out in fast shut down mode. Even when we escalated the shutdown signal to immediate, the processes were not exiting.

I’m trying to put together a checklist for data for us to capture to determine the root cause of the hang if we encounter this issue again. For example, running echo w > /proc/sysrq-trigger to get a list of processes in uninterruptible sleep, and perform a kernel stack trace on them. Is it worth stracing the postmaster process and surviving children? Does pg_controldata surface any useful data?

As a follow up question, is there a way to obtain an administrative backdoor or leave one open during hanging fast shutdown operations?

Thanks in advance for any clarity or guidance anyone the message board can provide.

Joe Hammerman