Hi,
After a smart shutdown is issued(with pg_ctl), run a parallel query,
then the query hangs. The postmaster doesn't inform backends about the
smart shutdown(see pmdie() -> SIGTERM -> BACKEND_TYPE_NORMAL are not
informed), so if they request parallel workers, the postmaster is
unable to fork any workers as it's status(pmState) gets changed to
PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
bgworker_should_start_now() returns false).
Few ways we could solve this:
1. Do we want to disallow parallelism when there is a pending smart
shutdown? - If yes, then, we can let the postmaster know the regular
backends whenever a smart shutdown is received and the backends use
this info to not consider parallelism. If we use SIGTERM to notify,
since the backends have die() as handlers, they just cancel the
queries which is again an inconsistent behaviour[1]. Would any other
signal like SIGUSR2(I think it's currently ignored by backends) be
used here? If the signals are overloaded, can we multiplex SIGTERM
similar to SIGUSR1? If we don't want to use signals at all, the
postmaster can make an entry of it's status in bg worker shared memory
i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
simply return, without requesting the postmaster for parallel workers.
2. If we want to allow parallelism, then, we can tweak
bgworker_should_start_now(), detect that the pending bg worker fork
requests are for parallelism, and let the postmaster start the
workers.
Thoughts?
Note: this issue is identified while working on [1]
[1] - https://www.postgresql.org/message-id/CALj2ACWTAQ2uWgj4yRFLQ6t15MMYV_uc3GCT5F5p8R9pzrd7yQ%40mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com