Re: Parallel worker hangs while handling errors. - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: Parallel worker hangs while handling errors.
Date
Msg-id CALj2ACUJ7HhB_76nYep950V-dDCOX+bzs9vKqQdPvhLkzNsprQ@mail.gmail.com
Whole thread Raw
In response to Re: Parallel worker hangs while handling errors.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Aug 7, 2020 at 11:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Aug 7, 2020 at 12:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> That SETMASK call will certainly unblock SIGQUIT, so I don't see what
> >> your point is.
>
> > I can't figure out if you're trolling me here or what. It's true that
> > the PG_SETMASK() call will certainly unblock SIGQUIT, but that would
> > also be true if the sigdelset() call were absent.
>
> The point of the sigdelset is that if somewhere later on, we install
> the BlockSig mask, then SIGQUIT will remain unblocked.  You asserted
> upthread that noplace in these processes ever does so; maybe that's
> true today, or maybe not, but the intent of this code is that *once
> we get through initialization* SIGQUIT will remain unblocked.
>
> I'll concede that it's not 100% clear whether or not these processes
> need to re-block SIGQUIT during error recovery.  I repeat, though,
> that I'm disinclined to change that without some evidence that there's
> actually a problem with the way it works now.
>

I think the main point that needs to be thought is that: will any of
the bgwriter, checkpointer, walwriter and walreceiver processes need
to unblock SIGQUIT during their error recovery code paths i.e. in
their respective if (sigsetjmp(local_sigjmp_buf, 1) != 0){....}
stanzas? Currently, SIGQUIT is blocked in the sigsetjmp() stanza.

If the answer is yes: then we must do PG_SETMASK(&BlockSig); :either
right after sigdelset(&BlockSig, SIGQUIT); to allow quickdie() even
before the sigsetjmp() stanza and also in the sigsetjmp() stanza or do
PG_SETMASK(&BlockSig); only inside the sigsetjmp() stanza.  The
postmaster sends SIGQUIT in immediate shutdown mode and it gives
children a chance to exit safely, but if the children take longer
time, then it anyways kills them with SIGKILL(note that SIGKILL can
not be handled or ignored by any process).

If the answer is no: let these processes perform clean ups in their
respective sigsetjmp() stanzas, until the postmaster sends SIGKILL if
the clean ups take time. We could have some elaborated comments before
sigdelset(&BlockSig, SIGQUIT); instead of "/* We allow SIGQUIT
(quickdie) at all times */" to avoid confusion.

We must not worry about blocking or unblocking SIGQUIT in these
processes after the sigsetjmp() stanza, as it anyways gets unblocked
by PG_SETMASK(&UnBlockSig); and also no problem if somebody does
PG_SETMASK(&BlockSig); in future as we have already done
sigdelset(&BlockSig, SIGQUIT);.

Can we start a separate thread to discuss this SIGQUIT point to not
sidetrack the main issue "Parallel worker hangs while handling
errors."?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Next
From: Amit Langote
Date:
Subject: Re: problem with RETURNING and update row movement