Re: Parallel worker hangs while handling errors. - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel worker hangs while handling errors.
Date
Msg-id CA+TgmoactknFJX--rK07OPmhh_eucFZfj7b_HrtGCWS3wNaDBw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel worker hangs while handling errors.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Parallel worker hangs while handling errors.
List pgsql-hackers
On Fri, Aug 7, 2020 at 11:36 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think that code is the way it is intentionally: the idea is to not
> accept any signals until we reach the explicit "PG_SETMASK(&UnBlockSig);"
> call further down, between the sigsetjmp stanza and the main loop.
> The sigdelset call, just like the adjacent pqsignal calls, is
> preparatory setup; it does not intend to allow anything to happen
> immediately.

I don't think that your analysis here is correct. The sigdelset call
is manipulating BlockSig, and the subsequent PG_SETMASK call is
working with UnblockSig, so it doesn't make sense to view one as a
preparatory step for the other. It could be correct to interpret the
sigdelset call as preparatory work for a future call to
PG_SETMASK(&BlockSig), but AFAICS there are no such calls in the
processes where this incantation exists, so really it just seems to be
a no-op. Furthermore, the comment says that the point of the
sigdelset() is to allow SIGQUIT "at all times," which doesn't square
well with your suggestion that we intended it to take effect only
later.

> In general, you don't want to accept signals in that area because the
> process state may not be fully set up yet.  You could argue that the
> SIGQUIT handler has no state dependencies, making it safe to accept
> SIGQUIT earlier during startup of one of these processes, and likewise
> for them to accept SIGQUIT during error recovery.  But barring actual
> evidence of a problem with slow SIGQUIT response in these areas I'm more
> inclined to leave well enough alone.  Changing this would add hazards,
> e.g. if somebody ever changes the behavior of the SIGQUIT handler, so
> I'd want some concrete evidence of a benefit.  It seems fairly
> irrelevant to the problem at hand with bgworkers, anyway.

The SIGQUIT handler in question contains nothing than a call to
_exit(2) and a long comment explaining why we don't do anything else,
so I think the argument that it has no state dependencies is pretty
well water-tight. Whether or not we've got a problem with timely
SIGQUIT acceptance is much less clear. So it seems to me that the
safer thing to do here would be to unblock the signal. It might gain
something, and it can't really lose anything. Now it's true that the
calculus might change if someone were to modify the behavior of the
SIGQUIT handler in the future, but if they do then it's their job to
think about this stuff. It doesn't seem especially likely for that to
change, anyway. The only reason that the handler for regular backends
does anything other than _exit(2) is that we want to try to let the
client know what happened before we croak, and that concern is
irrelevant for background workers. Doing any other cleanup here is
unsafe and unnecessary.

> As for said problem, I concur with Robert that the v4 patch seems pretty
> dubious; it's adding a lot of barely-thought-out mechanism for no
> convincing gain in safety.  I think the v1 patch was more nearly the right
> thing, except that the unblock needs to happen a tad further down, as
> attached (this is untested but certainly it should pass any test that v1
> passed).  I didn't do anything about rewriting the bogus comment just
> above the sigsetjmp call, but I agree that that should happen too.

I am not sure whether the difference between this and v1 matters,
because in postgres.c it's effectively happening inside sigsetjmp, so
the earlier unblock must not be that bad. But I don't mind putting it
in the place you suggest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist
Next
From: Robert Haas
Date:
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist