Re: Parallel query hangs after a smart shutdown is issued - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Parallel query hangs after a smart shutdown is issued |
Date | |
Msg-id | CA+hUKGJZ9d2vUTZMPnNkZP_pJ-7++XyHP48K+OYEi6S4feUD4Q@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel query hangs after a smart shutdown is issued (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Parallel query hangs after a smart shutdown is issued
|
List | pgsql-hackers |
On Thu, Aug 13, 2020 at 6:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > On Thu, Aug 13, 2020 at 3:32 AM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > >> After a smart shutdown is issued(with pg_ctl), run a parallel query, > >> then the query hangs. > > > Yeah, the current situation is not good. I think your option 2 sounds > > better, because the documented behaviour of smart shutdown is that it > > "lets existing sessions end their work normally". I think that means > > that a query that is already running or allowed to start should be > > able to start new workers and not have its existing workers > > terminated. Arseny Sher wrote a couple of different patches to try > > that last year, but they fell through the cracks: > > https://www.postgresql.org/message-id/flat/CA%2BhUKGLrJij0BuFtHsMHT4QnLP54Z3S6vGVBCWR8A49%2BNzctCw%40mail.gmail.com > > I already commented on this in the other thread that Bharath started [1]. > I think the real issue here is why is the postmaster's SIGTERM handler > doing *anything* other than disallowing new connections? It seems quite > premature to kill support processes of any sort, not only parallel > workers. The documentation says existing clients are allowed to end > their work, not that their performance is going to be crippled until > they end. Right. It's pretty strange that during smart shutdown, you could run for hours with no autovacuum, walwriter, bgwriter. I guess Arseny and I were looking for a minimal change to fix a bug, but clearly there's a more general problem and this change works out cleaner anyway. > So I looked at moving the kills of all the support processes to happen > after we detect that there are no remaining regular backends, and it > seems to not be too hard. I realized that the existing PM_WAIT_READONLY > state is doing that already, but just for a subset of support processes > that it thinks might be active in hot standby mode. So what I did in the > attached was to repurpose that state as "PM_WAIT_CLIENTS", which does the > right thing in either regular or hot standby mode. Make sense, works as expected and passes check-world. > One other thing I changed here was to remove PM_WAIT_READONLY from the > set of states in which we'll allow promotion to occur or a new walreceiver > to start. I'm not convinced that either of those behaviors aren't > bugs; although if someone thinks they're right, we can certainly put > back PM_WAIT_CLIENTS in those checks. (But, for example, it does not > appear possible to reach PM_WAIT_READONLY/PM_WAIT_CLIENTS state with > Shutdown == NoShutdown, so the test in MaybeStartWalReceiver sure looks > like confusingly dead code to me. If we do want to allow restarting > the walreceiver in this state, the Shutdown condition needs fixed.) If a walreceiver is allowed to run, why should it not be allowed to restart? Yeah, I suppose that other test'd need to be Shutdown <= SmartShutdown, just like we do in SIGHUP_handler(). Looking at other places where we test Shutdown == NoShutdown, one that jumps out is the autovacuum wraparound defence stuff and the nearby PMSIGNAL_START_AUTOVAC_WORKER code.
pgsql-hackers by date: