Re: Parallel query vs smart shutdown and Postmaster death - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Parallel query vs smart shutdown and Postmaster death
Date
Msg-id CA+hUKG+MF0G7f8UKvTWiGs4iFng5bA_jL8RT4X2WdhP+oE8gkg@mail.gmail.com
Whole thread Raw
In response to Parallel query vs smart shutdown and Postmaster death  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Parallel query vs smart shutdown and Postmaster death
Re: Parallel query vs smart shutdown and Postmaster death
List pgsql-hackers
On Mon, Feb 25, 2019 at 2:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> 1.  In a nearby thread, I misdiagnosed a problem reported[1] by Justin
> Pryzby (though my misdiagnosis is probably still a thing to be fixed;
> see next).  I think I just spotted the real problem he saw: if you
> execute a parallel query after a smart shutdown has been initiated,
> you wait forever in gather_readnext()!  Maybe parallel workers can't
> be launched in this state, but we lack code to detect this case?  I
> haven't dug into the exact mechanism or figured out what to do about
> it yet, and I'm tied up with something else for a bit, but I will come
> back to this later if nobody beats me to it.

Given smart shutdown's stated goal, namely that it "lets existing
sessions end their work normally", my questions are:

1.  Why does pmdie()'s SIGTERM case terminate parallel workers
immediately?  That breaks aborts running parallel queries, so they
don't get to end their work normally.
2.  Why are new parallel workers not allowed to be started while in
this state?  That hangs future parallel queries forever, so they don't
get to end their work normally.
3.  Suppose we fix the above cases; should we do it for parallel
workers only (somehow), or for all bgworkers?  It's hard to say since
I don't know what all bgworkers do.

In the meantime, perhaps we should teach the postmaster to report this
case as a failure to launch in back-branches, so that at least
parallel queries don't hang forever?  Here's an initial sketch of a
patch like that: it gives you "ERROR:  parallel worker failed to
initialize" and "HINT:  More details may be available in the server
log." if you try to run a parallel query.  The HINT is right, the
server logs say that a smart shutdown is in progress.  If that seems a
bit hostile, consider that any parallel queries that were running at
the moment the smart shutdown was requested have already been ordered
to quit; why should new queries started after that get a better deal?
Then perhaps we could do some more involved surgery on master that
achieves smart shutdown's stated goal here, and lets parallel queries
actually run?  Better ideas welcome.

-- 
Thomas Munro
https://enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Allowing extensions to supply operator-/function-specific info
Next
From: Alvaro Herrera
Date:
Subject: Re: Segfault when restoring -Fd dump on current HEAD