Re: Parallel query vs smart shutdown and Postmaster death - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: Parallel query vs smart shutdown and Postmaster death
Date
Msg-id CA+hUKG+7m3hvhf8LBRftnpX6FmYa282XtgFhjw=aLd1RBxi-Bg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel query vs smart shutdown and Postmaster death  (Arseny Sher <a.sher@postgrespro.ru>)
Responses Re: Parallel query vs smart shutdown and Postmaster death
List pgsql-hackers
On Sun, Mar 17, 2019 at 5:53 PM Arseny Sher <a.sher@postgrespro.ru> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>
> > 1.  Why does pmdie()'s SIGTERM case terminate parallel workers
> > immediately?  That breaks aborts running parallel queries, so they
> > don't get to end their work normally.
> > 2.  Why are new parallel workers not allowed to be started while in
> > this state?  That hangs future parallel queries forever, so they don't
> > get to end their work normally.
> > 3.  Suppose we fix the above cases; should we do it for parallel
> > workers only (somehow), or for all bgworkers?  It's hard to say since
> > I don't know what all bgworkers do.
>
> Attached patch fixes 1 and 2. As for the 3, the only other internal
> bgworkers currently are logical replication launcher and workers, which
> arguably should be killed immediately.

Hi Arseny,

Thanks for working on this!  Yes, it seems better to move forwards
rather than backwards, and fix this properly as you have done in this
patch.

Just a thought: instead of the new hand-coded loop you added in
pmdie(), do you think it would make sense to have a new argument
"exclude_class_mask" for SignalSomeChildren()?  If we did that, I
would consider renaming the existing parameter "target" to
"include_type_mask" to make it super clear what's going on.

> > Here's an initial sketch of a
> > patch like that: it gives you "ERROR:  parallel worker failed to
> > initialize" and "HINT:  More details may be available in the server
> > log." if you try to run a parallel query.
>
> Reporting bgworkers that postmaster is never going to start is probably
> worthwhile doing anyway to prevent potential further deadlocks like
> this. However, I think there is a problem in your patch: we might be in
> post PM_RUN states due to FatalError, not because of shutdown. In this
> case, we shouldn't refuse to run bgws in the future. I would also merge
> the check into bgworker_should_start_now.

Hmm, yeah, I haven't tested or double-checked in detail yet but I
think you're probably right about all of that.

-- 
Thomas Munro
https://enterprisedb.com


pgsql-hackers by date:

Previous
From: "Iwata, Aya"
Date:
Subject: RE: [PATCH] get rid of StdRdOptions, use individual binaryreloptions representation for each relation kind instead
Next
From: Michael Paquier
Date:
Subject: Re: Offline enabling/disabling of data checksums