Thread: Platform-dependent(?) failure in timeout handling

Platform-dependent(?) failure in timeout handling

From

Tom Lane

Date:

26 November 2013, 23:50:33

Dan Wood sent me off-list the test case mentioned in

http://www.postgresql.org/message-id/CAPweHKfExEsbACRXQTBdu_4QxhHk-Cic_iwmbh5XHo_0Z=Q-1g@mail.gmail.com

I've been poking at it, and one of the failure modes I'm seeing is that
all the backends hang up without crashing.  I thought at first it was an
undetected deadlock, but what it turns out to be is breakage in timeout
processing.  The symptom is reminiscent of what we saw on OpenBSD awhile
back:

http://www.postgresql.org/message-id/5151EDDD.9080004@kaltenbrunner.cc

namely that backends end up with SIGALRM blocked and therefore can't
process any more timeout interrupts.  I've been able to devise a
repeatable test case: do this setup

create table lock_bug(f1 int, f2 int);

create or replace function timeoutfail() returns integer as $$
BEGIN SET lock_timeout = 100; BEGIN   INSERT INTO lock_bug VALUES (1, 2); EXCEPTION WHEN OTHERS THEN   return 0; END;
return1;
 
END;
$$ LANGUAGE plpgsql;

and then try timeoutfail() a few times:

regression=# select timeoutfail();timeoutfail 
-------------          1
(1 row)

regression=# select timeoutfail();timeoutfail 
-------------          1
(1 row)

All fine so far.  Now, in another session, do

regression=# begin;
BEGIN
regression=# lock table lock_bug ;
LOCK TABLE

and then go back and try timeoutfail() some more.

regression=# select timeoutfail();timeoutfail 
-------------          0
(1 row)

regression=# select timeoutfail();
<< hangs ... >>

After the call that returns zero, the process has SIGALRM and SIGINT
blocked, so the timeout that should happen in the next call never comes.

I believe the reason for this is the mechanism that I speculated about in
that previous thread.  The platform is blocking SIGALRM while it executes
handle_sig_alarm(), and that calls LockTimeoutHandler() which does
"kill(MyProcPid, SIGINT)", and that SIGINT is being delivered immediately
(or at least before we can get out of handle_sig_alarm).  So now the
platform blocks SIGINT, too, and calls StatementCancelHandler(), which
proceeds to longjmp out of the whole signal-handling call stack.  So
the signal unblocking that would have happened after the handlers
returned doesn't happen.  In simpler cases we don't see an issue because
the longjmp returns to the setsigjmp(foo,1) call in postgres.c, which
will result in restoring the signal mask that was active at that stack
level, so we're all good.  However, PG_TRY() uses setsigjmp(foo,0),
which means that no signal mask restoration happens if we catch the
longjmp and don't ever re-throw it.  Which is exactly what happens in
plpgsql because of the EXCEPTION clause in the above example.

I don't know how many platforms block signals during handlers in this way,
but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least
OpenBSD acts likewise, so that's a pretty darn large chunk of the world.

There are several ways that we might address this:

1. Make PG_TRY() use setsigjmp(foo,1) instead.

2. Establish a coding rule that signal handlers that throw longjmp
must unblock signals first.

3. Establish a coding rule that if you catch an error with PG_TRY()
and don't re-throw it, you have to unblock signals in your PG_CATCH
block.

I don't especially like #1 because of performance considerations ---
I assume (perhaps wrongly) that saving and restoring the signal mask would
add noticeably to the cost of PG_TRY, and most of the time we don't need
that.  Also, it's arguable that we don't *want* signals to be unblocked
right away if we're dealing with some arbitrary error condition.  At
least, it seems scary for the PG_TRY mechanism to be enforcing a policy
that we unblock before any cleanup can happen.

#2 is more or less what I proposed in the prior thread.  I experimented
with putting "PG_SETMASK(&UnBlockSig);" into ProcessInterrupts() before
any of the ereport(ERROR) cases, and verified that this seems to resolve
the issue.  One could argue that this approach has the same defect of
unblocking signals before any cleanup can happen --- but the scope for
problems seems considerably narrower than if we make PG_TRY() itself
change behavior.

#3 is probably logically cleanest but I'm worried about the prospects
for missing some places that would need to be changed to conform to
this new coding rule.  In particular, any third-party PL that has
something corresponding to exception catching would have an issue.
(#2 is less bad on the how-much-code-changes scale because there are
very few signal handlers that ought to be throwing longjmps.)

So on balance I like approach #2 but I wonder if anyone has a different
opinion.

Also, while looking at this example, I see that there's another issue
internal to timeout.c: if a timeout handler function throws a longjmp
then handle_sig_alarm() loses control, which means that (1) any other
handler functions that should have fired at the same time won't, and
(2) we fail to reschedule the SIGALRM interrupt, meaning that we may
forget to fire future timeout events too.  This is a bit less bad
than the other problem because anything that calls any of the timeout
management functions will result in rescheduling the interrupt, but
it's still not good.  So I think we definitely ought to deprecate this
business of allowing the SIGINT interrupt to happen immediately in
LockTimeoutHandler().  An appropriate patch for that seems to be

diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index 3c3e41e..1499ace 100644
*** a/src/backend/utils/misc/timeout.c
--- b/src/backend/utils/misc/timeout.c
***************
*** 16,21 ****
--- 16,22 ----  #include <sys/time.h> 
+ #include "miscadmin.h" #include "storage/proc.h" #include "utils/timeout.h" #include "utils/timestamp.h"
*************** handle_sig_alarm(SIGNAL_ARGS)
*** 259,264 ****
--- 260,267 ---- {     int            save_errno = errno; 
+     HOLD_INTERRUPTS();
+      /*      * SIGALRM is always cause for waking anything waiting on the process      * latch.  Cope with MyProc not
beingthere, as the startup process also
 
*************** handle_sig_alarm(SIGNAL_ARGS)
*** 311,316 ****
--- 314,322 ----         }     } 
+     RESUME_INTERRUPTS();
+     CHECK_FOR_INTERRUPTS();
+      errno = save_errno; } 
(needs a few comments of course) which basically ensures that if the
SIGINT does get delivered while we're still in handle_sig_alarm, it'll
be serviced just before exit instead of while we're still updating the
timeout state.

Comments?
        regards, tom lane

Re: Platform-dependent(?) failure in timeout handling

From

Martijn van Oosterhout

Date:

27 November 2013, 08:23:44

On Tue, Nov 26, 2013 at 06:50:28PM -0500, Tom Lane wrote:
> I believe the reason for this is the mechanism that I speculated about in
> that previous thread.  The platform is blocking SIGALRM while it executes
> handle_sig_alarm(), and that calls LockTimeoutHandler() which does
> "kill(MyProcPid, SIGINT)", and that SIGINT is being delivered immediately
> (or at least before we can get out of handle_sig_alarm).  So now the
> platform blocks SIGINT, too, and calls StatementCancelHandler(), which
> proceeds to longjmp out of the whole signal-handling call stack.  So
> the signal unblocking that would have happened after the handlers
> returned doesn't happen.  In simpler cases we don't see an issue because
> the longjmp returns to the setsigjmp(foo,1) call in postgres.c, which
> will result in restoring the signal mask that was active at that stack
> level, so we're all good.  However, PG_TRY() uses setsigjmp(foo,0),
> which means that no signal mask restoration happens if we catch the
> longjmp and don't ever re-throw it.  Which is exactly what happens in
> plpgsql because of the EXCEPTION clause in the above example.
>
> I don't know how many platforms block signals during handlers in this way,
> but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least
> OpenBSD acts likewise, so that's a pretty darn large chunk of the world.

Isn't this why sigsetjmp/siglongjmp where invented? Is there a
situation where you don't want the signal mask restored?

BTW, seems on BSD systems sigsetjmp == setjmp:

http://www.gnu.org/software/libc/manual/html_node/Non_002dLocal-Exits-and-Signals.html

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: Platform-dependent(?) failure in timeout handling

From

Andres Freund

Date:

27 November 2013, 09:02:21

Hi,

On 2013-11-26 18:50:28 -0500, Tom Lane wrote:
> I don't know how many platforms block signals during handlers in this way,
> but I'm seeing it on Linux (RHEL6.5 to be exact) and we know that at least
> OpenBSD acts likewise, so that's a pretty darn large chunk of the world.

Just as a datapoint, I think on linux (and likely on any system using
sigaction()) that depends on how you setup the signal handler. There's a
flag that controls that behaviour, namely SA_NODEFER.
When using signal(2) the behaviour depends on the flavor of system we're
running and even on some #defines.

I am not really sure whether using SA_NODEFER would be a good idea, even
if we could achieve that behaviour on all platforms.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Platform-dependent(?) failure in timeout handling

From

Kevin Grittner

Date:

27 November 2013, 14:34:32

Tom Lane <tgl@sss.pgh.pa.us> wrote:

> 3. Establish a coding rule that if you catch an error with
> PG_TRY() and don't re-throw it, you have to unblock signals in
> your PG_CATCH block.

Could that be done in the PG_END_TRY macro?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Platform-dependent(?) failure in timeout handling

From

Tom Lane

Date:

27 November 2013, 17:55:00

Kevin Grittner <kgrittn@ymail.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> 3. Establish a coding rule that if you catch an error with
>> PG_TRY() and don't re-throw it, you have to unblock signals in
>> your PG_CATCH block.

> Could that be done in the PG_END_TRY macro?

Interesting idea ... [ thinks for a bit ... ] but I'm not sure it's really
a great fix.  PG_END_TRY would have to unblock *all* signals, I think,
and it's not clear you want that in that mechanism.  Consider a PG_TRY
executing inside a signal handler.  It would be semantically reasonable
to use sigsetjmp(foo,1) in PG_TRY, but as I said, I'm trying to avoid
that on performance grounds.

In any case, after sleeping on it I've realized that the HOLD_INTERRUPTS
dance I proposed adding to handle_sig_alarm() is necessary but not
sufficient to deal with interruptions from SIGINT.  The real reason that
we need HOLD_INTERRUPTS there is that the signal handler may be executing
while ImmediateInterruptsOK is true.  So if a random SIGINT happened to
arrive while we're updating the timeout data structures, we could lose
control and leave partially updated (ie, totally corrupted) timeout info.
Adding HOLD_INTERRUPTS prevents that case.  But, imagine that SIGINT
happens just as we enter handle_sig_alarm(), before we can do
HOLD_INTERRUPTS.  Then the SIGINT handler could longjmp out of the SIGALRM
handler; the data structures are OK, but we've effectively lost that
interrupt, and no new one is scheduled.  This isn't much of a problem
for the existing timer event handlers, because we'd just cancel them
anyway on the way out of the (sub)transaction, cf LockErrorCleanup.
But it could be a problem for future usages of the timer infrastructure.

In the case where we longjmp all the way out to the idle loop, we cancel
all pending timeouts anyway in postgres.c's sigsetjmp cleanup stanza.
It would be reasonable to add a PG_SETMASK(&UnBlockSig) in the same area.

I am thinking that what we want to do is add a signal unblock step to
(sub)transaction abort, too, as well as a call to timeout.c to let it
re-establish its timer interrupt in case one got lost.  This would fix
the problem as long as you assume that anyone catching an arbitrary error
condition will do a subtransaction abort to get back into a good state.
But we're requiring that already.
        regards, tom lane