Re: Parallel query hangs after a smart shutdown is issued - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Parallel query hangs after a smart shutdown is issued
Date
Msg-id CA+hUKGL0_PqgAc9xEa-gZqtgYY0ykeJW=oWsT3g9z9LURozqTg@mail.gmail.com
Whole thread Raw
In response to Parallel query hangs after a smart shutdown is issued  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: Parallel query hangs after a smart shutdown is issued
List pgsql-hackers
On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> After a smart shutdown is issued(with pg_ctl), run a parallel query,
> then the query hangs. The postmaster doesn't inform backends about the
> smart shutdown(see pmdie()  ->  SIGTERM -> BACKEND_TYPE_NORMAL are not
> informed), so if they request parallel workers, the postmaster is
> unable to fork any workers as it's status(pmState) gets changed to
> PM_WAIT_BACKENDS(see maybe_start_bgworkers() -->
> bgworker_should_start_now() returns false).
>
> Few ways we could solve this:
> 1. Do we want to disallow parallelism when there is a pending smart
> shutdown? - If yes, then, we can let the postmaster know the regular
> backends whenever a smart shutdown is received and the backends use
> this info to not consider parallelism. If we use SIGTERM to notify,
> since the backends have die() as handlers, they just cancel the
> queries which is again an inconsistent behaviour[1]. Would any other
> signal like SIGUSR2(I think it's currently ignored by backends) be
> used here? If the signals are overloaded, can we multiplex SIGTERM
> similar to SIGUSR1? If we don't want to use signals at all, the
> postmaster can make an entry of it's status in bg worker shared memory
> i.e. BackgroundWorkerData, RegisterDynamicBackgroundWorker() can
> simply return, without requesting the postmaster for parallel workers.
>
> 2. If we want to allow parallelism, then, we can tweak
> bgworker_should_start_now(), detect that the pending bg worker fork
> requests are for parallelism, and let the postmaster start the
> workers.
>
> Thoughts?

Hello Bharath,

Yeah, the current situation is not good.  I think your option 2 sounds
better, because the documented behaviour of smart shutdown is that it
"lets existing sessions end their work normally".  I think that means
that a query that is already running or allowed to start should be
able to start new workers and not have its existing workers
terminated.  Arseny Sher wrote a couple of different patches to try
that last year, but they fell through the cracks:

https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: Dmitry Dolgov
Date:
Subject: pg_stat_statements and "IN" conditions
Next
From: Andres Freund
Date:
Subject: Re: Improving connection scalability: GetSnapshotData()