On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> then the query hangs. The postmaster doesn't inform backends about the
> smart shutdown(see pmdie() -> SIGTERM -> BACKEND_TYPE_NORMAL are not
> informed), so if they request parallel workers, the postmaster is
> unable to fork any workers as it's status(pmState) gets changed to
> PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
> bgworker_should_start_now() returns false).
>
> Few ways we could solve this:
> 1. Do we want to disallow parallelism when there is a pending smart
> shutdown? - If yes, then, we can let the postmaster know the regular
> backends whenever a smart shutdown is received and the backends use
> this info to not consider parallelism. If we use SIGTERM to notify,
> since the backends have die() as handlers, they just cancel the
> queries which is again an inconsistent behaviour[1]. Would any other
> signal like SIGUSR2(I think it's currently ignored by backends) be
> used here? If the signals are overloaded, can we multiplex SIGTERM
> similar to SIGUSR1? If we don't want to use signals at all, the
> postmaster can make an entry of it's status in bg worker shared memory
> i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
> simply return, without requesting the postmaster for parallel workers.
>
> 2. If we want to allow parallelism, then, we can tweak
> bgworker_should_start_now(), detect that the pending bg worker fork
> requests are for parallelism, and let the postmaster start the
> workers.
>
> Thoughts?
Hello Bharath,
Yeah, the current situation is not good. I think your option 2 sounds
better, because the documented behaviour of smart shutdown is that it
"lets existing sessions end their work normally". I think that means
that a query that is already running or allowed to start should be
able to start new workers and not have its existing workers
terminated. Arseny Sher wrote a couple of different patches to try
that last year, but they fell through the cracks:
https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com