Re: Parallel query vs smart shutdown and Postmaster death - Mailing list pgsql-hackers
From | Arseny Sher |
---|---|
Subject | Re: Parallel query vs smart shutdown and Postmaster death |
Date | |
Msg-id | 87lg1ep0ds.fsf@ars-thinkpad Whole thread Raw |
In response to | Re: Parallel query vs smart shutdown and Postmaster death (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Parallel query vs smart shutdown and Postmaster death
|
List | pgsql-hackers |
Hi, Thomas Munro <thomas.munro@gmail.com> writes: > 1. Why does pmdie()'s SIGTERM case terminate parallel workers > immediately? That breaks aborts running parallel queries, so they > don't get to end their work normally. > 2. Why are new parallel workers not allowed to be started while in > this state? That hangs future parallel queries forever, so they don't > get to end their work normally. > 3. Suppose we fix the above cases; should we do it for parallel > workers only (somehow), or for all bgworkers? It's hard to say since > I don't know what all bgworkers do. Attached patch fixes 1 and 2. As for the 3, the only other internal bgworkers currently are logical replication launcher and workers, which arguably should be killed immediately. > Here's an initial sketch of a > patch like that: it gives you "ERROR: parallel worker failed to > initialize" and "HINT: More details may be available in the server > log." if you try to run a parallel query. Reporting bgworkers that postmaster is never going to start is probably worthwhile doing anyway to prevent potential further deadlocks like this. However, I think there is a problem in your patch: we might be in post PM_RUN states due to FatalError, not because of shutdown. In this case, we shouldn't refuse to run bgws in the future. I would also merge the check into bgworker_should_start_now. -- Arseny Sher Postgres Professional: http://www.postgrespro.com The Russian Postgres Company From da11d22a5ed78a690ccdbfeb77c59749c541d463 Mon Sep 17 00:00:00 2001 From: Arseny Sher <sher-ars@yandex.ru> Date: Sun, 17 Mar 2019 07:42:18 +0300 Subject: [PATCH] Allow parallel workers while backends are alive in 'smart' shutdown. Since postmaster doesn't advertise that he is never going to start parallel workers after shutdown was issued, parallel leader stucks waiting for them indefinitely. --- src/backend/postmaster/postmaster.c | 44 ++++++++++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 6 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index fe599632d3..d60969fdb8 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2689,10 +2689,30 @@ pmdie(SIGNAL_ARGS) if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_HOT_STANDBY || pmState == PM_STARTUP) { + slist_iter siter; + + /* + * Shut down all bgws except parallel workers: graceful + * termination of existing sessions needs them. + */ + slist_foreach(siter, &BackgroundWorkerList) + { + RegisteredBgWorker *rw; + + rw = slist_container(RegisteredBgWorker, rw_lnode, siter.cur); + if (rw->rw_pid > 0 && + ((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) == 0)) + { + ereport(DEBUG4, + (errmsg_internal("sending signal %d to process %d", + SIGTERM, (int) rw->rw_pid))); + signal_child(rw->rw_pid, SIGTERM); + + } + } + /* autovac workers are told to shut down immediately */ - /* and bgworkers too; does this need tweaking? */ - SignalSomeChildren(SIGTERM, - BACKEND_TYPE_AUTOVAC | BACKEND_TYPE_BGWORKER); + SignalSomeChildren(SIGTERM, BACKEND_TYPE_AUTOVAC); /* and the autovac launcher too */ if (AutoVacPID != 0) signal_child(AutoVacPID, SIGTERM); @@ -5735,18 +5755,30 @@ do_start_bgworker(RegisteredBgWorker *rw) * specified start_time? */ static bool -bgworker_should_start_now(BgWorkerStartTime start_time) +bgworker_should_start_now(RegisteredBgWorker *rw) { + BgWorkerStartTime start_time = rw->rw_worker.bgw_start_time; + switch (pmState) { case PM_NO_CHILDREN: case PM_WAIT_DEAD_END: case PM_SHUTDOWN_2: case PM_SHUTDOWN: + return false; + case PM_WAIT_BACKENDS: case PM_WAIT_READONLY: case PM_WAIT_BACKUP: - break; + /* + * Allow new parallel workers in SmartShutdown while backends + * are still there. + */ + if (((rw->rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) == 0) || + Shutdown != SmartShutdown || + FatalError) + return false; + /* fall through */ case PM_RUN: if (start_time == BgWorkerStart_RecoveryFinished) @@ -5906,7 +5938,7 @@ maybe_start_bgworkers(void) } } - if (bgworker_should_start_now(rw->rw_worker.bgw_start_time)) + if (bgworker_should_start_now(rw)) { /* reset crash time before trying to start worker */ rw->rw_crashed_at = 0; -- 2.11.0
pgsql-hackers by date: