Home > mailing lists

Re: Parallel query vs smart shutdown and Postmaster death - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Parallel query vs smart shutdown and Postmaster death
Date	February 27, 2019 01:43:55
Msg-id	CA+hUKG+MF0G7f8UKvTWiGs4iFng5bA_jL8RT4X2WdhP+oE8gkg@mail.gmail.com Whole thread Raw
In response to	Parallel query vs smart shutdown and Postmaster death (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Parallel query vs smart shutdown and Postmaster death Re: Parallel query vs smart shutdown and Postmaster death
List	pgsql-hackers

Tree view

On Mon, Feb 25, 2019 at 2:13 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> 1.  In a nearby thread, I misdiagnosed a problem reported[1] by Justin
> Pryzby (though my misdiagnosis is probably still a thing to be fixed;
> see next).  I think I just spotted the real problem he saw: if you
> execute a parallel query after a smart shutdown has been initiated,
> you wait forever in gather_readnext()!  Maybe parallel workers can't
> be launched in this state, but we lack code to detect this case?  I
> haven't dug into the exact mechanism or figured out what to do about
> it yet, and I'm tied up with something else for a bit, but I will come
> back to this later if nobody beats me to it.

Given smart shutdown's stated goal, namely that it "lets existing
sessions end their work normally", my questions are:

1.  Why does pmdie()'s SIGTERM case terminate parallel workers
immediately?  That breaks aborts running parallel queries, so they
don't get to end their work normally.
2.  Why are new parallel workers not allowed to be started while in
this state?  That hangs future parallel queries forever, so they don't
get to end their work normally.
3.  Suppose we fix the above cases; should we do it for parallel
workers only (somehow), or for all bgworkers?  It's hard to say since
I don't know what all bgworkers do.

In the meantime, perhaps we should teach the postmaster to report this
case as a failure to launch in back-branches, so that at least
parallel queries don't hang forever?  Here's an initial sketch of a
patch like that: it gives you "ERROR:  parallel worker failed to
initialize" and "HINT:  More details may be available in the server
log." if you try to run a parallel query.  The HINT is right, the
server logs say that a smart shutdown is in progress.  If that seems a
bit hostile, consider that any parallel queries that were running at
the moment the smart shutdown was requested have already been ordered
to quit; why should new queries started after that get a better deal?
Then perhaps we could do some more involved surgery on master that
achieves smart shutdown's stated goal here, and lets parallel queries
actually run?  Better ideas welcome.

-- 
Thomas Munro
https://enterprisedb.com

Attachment

0001-Report-bgworker-launch-failure-during-smart-shutdown.patch

pgsql-hackers by date:

From: Tom Lane
Date: 27 February 2019, 01:31:12
Subject: Re: Allowing extensions to supply operator-/function-specific info

From: Alvaro Herrera
Date: 27 February 2019, 01:49:53
Subject: Re: Segfault when restoring -Fd dump on current HEAD

Re: Parallel query vs smart shutdown and Postmaster death - Mailing list pgsql-hackers

Attachment

Previous

Next