Condition variables vs interrupts - Mailing list pgsql-hackers

From Thomas Munro
Subject Condition variables vs interrupts
Date
Msg-id CA+hUKGJOm8zZHjVA8svoNT3tHY0XdqmaC_kHitmgXDQM49m1dA@mail.gmail.com
Whole thread Raw
Responses Re: Condition variables vs interrupts  (Shawn Debnath <sdn@amazon.com>)
List pgsql-hackers
Hi,

While testing something unrelated, Tomas reported[1] that he could
make a parallel worker ignore a SIGTERM and hang forever in
ConditionVariableSleep().  I looked into this and realised that it's
more likely in master.  Commit 1321509f refactored the latch wait loop
to look a little bit more like other examples* by putting
CHECK_FOR_INTERRUPTS() after ResetLatch(), whereas previously it was
at the top of the loop.  ConditionVariablePrepareToSleep() was
effectively relying on that order when it reset the latch without its
own CFI().

The bug goes back to the introduction of CVs however, because there
was no guarantee that you'd ever reach ConditionVariableSleep().  You
could call ConditionVariablePrepareToSleep(), test your condition,
decide you're done, then call ConditionVariableCancelSleep(), then
reach some other WaitLatch() with no intervening CFI().  It might be
hard to find a code path that actually does that without a
coincidental CFI() to save you, but at least in theory the problem
exists.

I think we should probably just remove the unusual ResetLatch() call,
rather than adding a CFI().  See attached.  Thoughts?

*It can't quite be exactly like the two patterns shown in latch.h,
namely { Reset, Test, Wait } and { Test, Wait, Reset }, because the
real test is external to this function; we have the other possible
rotation { Wait, Reset, Test }, and this code is only run if the
client's test failed.  Really it's a nested loop, with the outer loop
belonging to the caller, so we have { Test', { Wait, Reset, Test } }.
However, CHECK_FOR_INTERRUPTS() also counts as a test of work to do,
and AFAICS it always belongs between Reset and Wait, no matter how far
you rotate the loop.  I wonder if latch.h should mention that.

[1] https://www.postgresql.org/message-id/20191217232124.3dtrycatgfm6oxxb%40development

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Add support for automatically updating Unicode derived files
Next
From: Ranier Vilela
Date:
Subject: [PATCH] Fix expressions always false